Sage Journals: Discover world-class research

Abstract

Decompositions make it possible to investigate whether gaps between groups in certain outcomes would remain if groups had comparable characteristics. In practice, however, such a counterfactual comparability is difficult to establish in the presence of lacking common support, functional-form misspecification, and insufficient sample size. In this article, the authors show how decompositions can be undermined by these three interrelated issues by comparing the results of a regression-based Kitagawa-Blinder-Oaxaca decomposition and matching decompositions applied to simulated and real-world data. The results show that matching decompositions are robust to issues of common support and functional-form misspecification but demand a large number of observations. Kitagawa-Blinder-Oaxaca decompositions provide consistent estimates also for smaller samples but require assumptions for model specification and, when common support is lacking, for model-based extrapolation. The authors recommend that any decomposition benefits from using a matching approach first to assess potential problems of common support and misspecification.

Keywords

decomposition matching Kitagawa-Blinder-Oaxaca simulation

A fundamental goal of social stratification research is to understand why there are gaps in socioeconomic outcomes between social groups, for example, in terms of educational attainment (Bernardi and Boertien 2017; Krause, Rinne, and Schüller 2015), political behavior (Dassonneville and Kostelka 2021), or health (Mustapha et al. 2017). Decomposition techniques are a common way to examine these gaps; they allow one to assess to what extent observed differences in group characteristics contribute to observed gaps in outcomes. For example, scholars have used decompositions to show how the gender wage gap can be attributed to differences in various wage determinants between women and men, such as educational attainment, labor market experience, and occupational segregation (Blau and Kahn 2017). To this end, decomposition techniques estimate counterfactual group outcomes as if the group differences in relevant characteristics had been eliminated. The reduction between raw and conditional gaps is the part of the observed gap “explained” by the differences in characteristics between groups, and the remaining “unexplained” part implies either group differences in the returns to these characteristics or unobserved heterogeneity.

Decomposition studies are regularly published by statistical offices, for example, on the gender wage gap in Germany (Mischler 2021) or the European Union (EU) (Leythienne and Pérez-Julián 2022), and they inform policymaking targeted at reducing compositional disadvantages of certain groups. On the premise that comparable relevant characteristics should generate the same outcome across groups, the unexplained component is often interpreted as an indicator of discrimination, which is an important policy concern. As this interpretation requires that no relevant characteristics have been omitted, most decomposition studies focus on including ever extending sets of predictors. However, less attention is given to the bias that may arise from methodological issues of decomposition techniques (Strittmatter and Wunsch 2021). For any decomposition to be informative, a valid estimation of group outcomes is required to establish counterfactual comparability between groups for any fixed set of characteristics. In practice, meeting this requirement is challenging for several interrelated methodological reasons, three of which we discuss in this article.

The first issue is that comparability between groups is often limited. When particular combinations of individual characteristics systematically occur in one group but not the other, this lack of common support suggests a structural noncomparability between groups. Common support can be diminished wherever social processes operate in a group-segregating fashion. Regarding wage gaps by gender, such processes include the prescription of care work to women and closure mechanisms like the glass ceiling, which result in employment patterns that are gendered in terms of working hours, labor market experience, and occupations to the extent that some women and men have no comparable counterpart (Djurdjevic and Radyakin 2007; Goraus, Tyrowicz, and Velde 2017; Ñopo 2008). Similarly, school segregation and “White flight” into private schools in the United States can lead to a lack of common support in terms of educational attainment and educational prestige by race and class (Fairlie 2002; Fiel 2013).

From an intersectional perspective, both theory and evidence suggest common support can be particularly low in contexts in which several dimensions of advantage and disadvantage (e.g., gender, race, nativity, class) coincide in the creation of unique group-specific experiences (Black et al. 2008; Crenshaw 1991; McCall 2005; Sprengholz and Hamjediers 2022). Thus, when studying gaps in particular outcomes, researchers need to investigate how comparable different groups are along intersectional lines (e.g., Black women and White women). For example, immigration and integration processes operate in a gendered and racialized way in Germany, and they produce systematic differences in wage-relevant characteristics on many disaggregation levels: between natives and immigrants, between immigrant men and immigrant women, between EU immigrant women and non-EU immigrant women, and so forth (Morokvasic-Müller 2014).

Linear regression-based decompositions, which are very common in the social sciences (Weichselbaumer and Winter-Ebmer 2005), conceal common support issues and establish comparability via some level of extrapolation in a parametric specification, which might or might not be correct. Comparing the incomparable, however, violates the decomposition logic, and several wage decomposition studies confirm that lacking common support can bias the explained and unexplained parts of the decomposition in either direction (Djurdjevic and Radyakin 2007; Goraus et al. 2017; Nicodemo and Ramos 2012; Ñopo 2008; Strittmatter and Wunsch 2021). Beyond bias, ignoring issues of common support in decompositions can also lead to a loss of information: one may miss the most pronounced structural inequalities between groups and thereby discount them as important mechanisms behind observed gaps.

The second issue this study investigates is insufficient sample size, which is closely related to the issue of common support. Random sampling of finite samples ensures representativity, on average, but the smaller the sample, the lower the likelihood of sampling comparable individuals in both groups. A lack of common support in a given sample might be a mere consequence of a limited number of observations for particular characteristics. Thus, careful inspection is needed to ensure that common support is not just a sample size artifact and actually indicates systematic noncomparability between groups.

The third issue is functional-form misspecification. Correct specifications of the relationship between the outcome and its predictors are necessary for the estimation of counterfactual group outcomes, which in turn allow the decomposition into explained and unexplained gaps. With limited model flexibility in terms of functional form and interaction effects, the unexplained and explained components can be biased (Bonaccolto-Töpfer and Briel 2022; Strittmatter and Wunsch 2021). This caveat applies even more when we lack common support, as out-of-support extrapolations are completely dependent on the parametric model applied to the observed sample. For example, a glass ceiling that prevents qualified women from reaching managerial positions (Cotter et al. 2001) means such women generate below-potential wage returns to their characteristics (compared with men). Model-based extrapolations that assume wage returns would not change if these women were actually managers are hardly convincing.

In this work, we illustrate the three issues by applying the popular regression-based Kitagawa-Blinder-Oaxaca (KBO) decomposition (Blinder 1973; Kitagawa 1955; Oaxaca 1973) and Ñopo’s (2008) matching decomposition to both simulated and real-world data. We show that both methods can come to different results under common scenarios with lacking common support, with insufficient sample size, and with functional-form misspecification. KBO estimates are specification dependent, whereas matching directly addresses the common support issue and is robust to functional-form misspecification because of its nonparametric nature. Matching, however, suffers from the curse of dimensionality: the smaller the sample size in relation to the detail of characteristics in the matching set, the higher the risk for too few observations for each combination of characteristics. These methods mark the extremes on the parametric spectrum, so we also offer supplementary results of an intermediate approach in which we match on propensity scores that condense group differences in characteristics into one summary measure. This intermediate method is robust to functional-form misspecification and small samples, but limited common support remains a potential issue.

We therefore suggest that scholars first examine potential problems with respect to misspecification and common support and explore the substantive importance of the latter before relying solely on KBO decompositions. From a theoretical perspective, the systematic lack of common support should not be technically concealed, but instead be seen as a starting point to understand the structural noncomparability between groups as an important mechanism behind gaps in outcomes.

Decomposition via KBO and Matching

We are interested in the decomposition of the raw gap in outcome $\bar{Y}$ between group $A$ and group $B$ ,

D = {\bar{Y}}_{B} - {\bar{Y}}_{A},

(1)

where $D$ denotes (dis)advantages of group $B$ compared with the reference group $A$ . The goal of the decomposition is to split the observed wage gap into an “explained” component that is based on differences between groups in the characteristics that predict $Y$ and a remaining “unexplained” component.

KBO

There are several variations of the KBO decomposition, all of which use regression techniques to decompose the raw gap into more or less detailed “explained” and “unexplained” components. We focus on the common (and simplest) twofold decomposition. It builds on (1) group-specific vectors of the mean values ${\bar{X}}_{A}$ and ${\bar{X}}_{B}$ for the specified predictors and (2) group-specific vectors of coefficients ${\hat{β}}_{A}$ and ${\hat{β}}_{B}$ obtained from a regression of the outcome on a set of predictors for each group (equation 2). The mean-differences in predictors represent compositional differences that make up the explained component $D_{X}$ (e.g., wage differences due to differences in labor market experience); differences in the associated regression coefficients represent differences in returns that make up the unexplained component $D_{0}$ (e.g., the same educational attainment having different wage returns for each group):

\begin{matrix} D & = & \bar{X}'_{B} \underset{\binom{difference}{in returns}}{\underset{︸}{({\hat{β}}_{B} - {\hat{β}}_{A})}} & + & \underset{\binom{compositional}{difference}}{\underset{︸}{({\bar{X}}_{B} - {\bar{X}}_{A})'}} {\hat{β}}_{A} \\ = & D_{0} & + & D_{X} \end{matrix}

(2)

We specify ${\hat{β}}_{A}$ as the counterfactual vector, which results in the following interpretation of the unexplained component for the exercise at hand: the unexplained component reflects the hypothetical gap that would persist if group $A$ had the same characteristics as group $B$ .

Matching

Ñopo (2008) proposed an alternative decomposition technique that builds on a matching approach. In a one-to-many exact matching, each individual from group $B$ is matched to all individuals from group $A$ with the same combination of characteristics (each unique combination of characteristics represents one stratum). For all observations of groups A and B, the matching flags if their characteristics are in common support (matched $m$ ) or out of common support (unmatched $u$ ). With common support, the ratio of $B$ to $A$ units in each stratum can be used to create a reweighted group $A^{B}$ , which has the exact same distribution across all strata as group $B$ . The outcome of this counterfactual group $A^{B}$ can be interpreted in two ways: (1) as the average outcome of group $A$ if it had the same characteristics as group $B$ and (2) as the average outcome of group $B$ if it had the same returns to characteristics as group $A$ . The overall gap can then be additively decomposed into four parts:

\begin{matrix} D = & D_{0} + \overset{compositional difference}{\overset{︷}{D_{X} + D_{A} + D_{B}}} \\ = & \overset{︷}{{\bar{Y}}_{B, m} - {\bar{Y}}_{A^{B}, m}} + \overset{︷}{{\bar{Y}}_{A^{B}, m} - {\bar{Y}}_{A, m}} \underset{\binom{splitting difference}{\binom{among matched by}{reweighted group A}}}{\underset{︸}{}} + \underset{\binom{out of}{support}}{\underset{︸}{D_{A} + D_{B}}} \end{matrix}

(3)

$D_{X}$ is the average gap between the matched units of reweighted group $A^{B}$ and the matched units of group $A$ , which is explained by the fact that groups $A$ and $B$ are differently distributed across matched strata (some sets of characteristics being more likely in one group than the other). $D_{0}$ is the average gap between matched units of group $B$ and the reweighted group $A^{B}$ . Because $B$ and $A^{B}$ are equally distributed across matched strata, $D_{0}$ captures how much of the raw gap remains unexplained by differences in the considered characteristics. $D_{0}$ and $D_{X}$ are analogous to the components of a twofold KBO decomposition, but they only pertain to matched units. When compositional differences between groups limit common support, the effect that unmatched individuals in both groups have on the outcome gap is captured by the components $D_{A}$ and $D_{B}$ :

D_{A} = \underset{\binom{gap between matched}{and unmatched A}}{\underset{︸}{({\bar{Y}}_{A, m} - {\bar{Y}}_{A, u})}} \cdot \underset{\binom{share of}{unmatched A}}{\underset{︸}{(N_{A, u} / N_{A})}} D_{B} = \underset{\binom{gap between unmatched}{and matched B}}{\underset{︸}{({\bar{Y}}_{B, u} - {\bar{Y}}_{B, m})}} \cdot \underset{\binom{share of}{unmatched B}}{\underset{︸}{(N_{B, u} / N_{B})}}

(4)

$D_{A}$ is the gap between the averages of the outcome $\bar{Y}$ for the unmatched $u$ and matched $m$ units within group $A$ , weighted by the frequency of unmatched $A$ units $(N_{A, u})$ in relation to the overall size of group $A$ $(N_{A})$ , so that $D_{A}$ approaches zero with fewer observations out of common support. $D_{A}$ denotes how much of the raw gap is due to unmatched $A$ units having higher or lower values in the outcome than matched $A$ units, where $D_{A} < 0$ if the outcome is lower among the matched, and $D_{A} > 0$ if the outcome is lower among the unmatched (reversed for $D_{B}$ ).

The matching procedure differs from the KBO decomposition in two important ways. First, exact matching estimates the explained $(D_{X})$ and unexplained $(D_{0})$ components nonparametrically, whereas KBO uses linear regression models for each group. Regression models are specification dependent, both in terms of the functional form of the relationship between the outcome and continuous predictors, and in terms of interactions between predictors. This dependency holds when outcome variables are transformed (e.g., when using the logarithm of wages in wage decompositions). Exact matching does not require such assumptions and estimates outcome gaps for all observed combinations of specified characteristics. Matching estimates are thus more robust against functional-form misspecification. A comparable KBO decomposition without parametric assumptions would require a fully saturated model (including categorical variables, continuous variables with all their possible powers, and all interactions).

Second, because $D_{X}$ and $D_{0}$ are estimated only on the common support in the matching decomposition, both terms are unaffected by observations that have no counterpart in the comparison group. This is not true for the KBO decomposition, which raises the question of how such observations affect the relative magnitude of explained and unexplained parts of the gap. Again, the outcome depends on the model specification: values for the outcome can either be extrapolated for unmatched sets of characteristics via linear relationships for continuous variables or via the assumption of global coefficients for categorical variables. However, such an extrapolation involves usually untestable assumptions. Moreover, even in a saturated KBO decomposition, the lack of common support poses a problem because the presence of unmatched units can inflate the explained and unexplained part.

In summary, KBO decompositions are highly conditional on the model specification, a problem that is exacerbated when we lack common support. In the presence of misspecification, the relative magnitudes of “explained” and “unexplained” gaps can be misleading (in both directions). Exact matching is not plagued by these issues, but it suffers from the curse of dimensionality, because the number of characteristics and characteristics’ levels we can sensibly match on is limited by sample size. Performing a matching decomposition on an insufficient sample will create artificial common support issues and inflate the components $D_{A}$ and $D_{B}$ . With a sufficiently sized sample, many unmatched combinations of characteristics would have a match, and the compositional differences between groups over the extended common support would be captured by the explained part $D_{X}$ . If the sample size is fixed, common support can only be enhanced in Ñopo’s matching by further coarsening of predictors, to the detriment of balance in characteristics and, thus, comparability between groups. Propensity scores are an intermediate approach often used to deal with this trade-off; they condense group differences in all predictors of the outcome into one parametrically estimated summary measure. For comparative purposes, we therefore offer additional estimates on the basis of propensity scores’ intermediate position on the parametric spectrum between KBO and Ñopo’s matching.

In the following, we examine the issues of common support, functional-form misspecification, and insufficient sample size using a simulation and a real-world example. In the simulation study, we mirror the usual research process and apply the same model specification to various generated data, some of which aligns with the specification and some of which does not. In the real-world example, we investigate wage gaps in survey data, and show how the results of several KBO specifications relate to the results of a matching decomposition when the data-generating process (DGP) is unknown.

Simulation

Data and Estimands

For the simulation, we set the sample size to 2,000, 10,000, or 50,000 observations, split in half between groups $A$ and $B$ . Our set of variables includes $y$ as the outcome and five uncorrelated determinants, for which we vary the DGP over the course of the simulation. The default DGP is described in Table 1. Three continuous variables are marked by $x_{1}$ , $x_{2}$ , and $x_{3}$ , all having a standard deviation of $1$ . $x_{1}$ follows a normal distribution with a $0.5$ mean difference between groups $A$ and $B$ ; $x_{2}$ follows a strongly right-skewed gamma distribution with a mean of $1$ and no differences between groups; $x_{3}$ is drawn from a normal distribution with a mean of $2$ and also no differences between groups. Finally, two categorical variables are denoted by $x_{d}$ (an indicator variable with 50 percent $x_{d} = 1$ ) and $x_{m}$ (a multinomial variable with three equally likely values). We generate the outcome $y$ in a linear-additive fashion by using the group-specific parameters $β_{A}$ and $β_{B}$ and an uncorrelated, normally distributed error term.

Table 1.

Default Data-Generating Process

Variable	Distribution	Mean/Proportion	S.D.	$β_{A}$	$β_{B}$
$x_{1}$	$~ normal (1, 1)$ for $A$	1	$1$	$0.5$	$0.5$
$x_{1}$	$~ normal (0.5, 1)$ for $B$	0.5	$1$
$x_{2}$	$~ gamma (1, 1)$	1	$1$	$1.0$	$0.7$
$x_{3}$	$~ normal (2, 1)$	2	$1$	$0.2$	$0.2$
$x_{d}$	$~ binomial (1, 0.5)$	0.5		$- 0.4$	$- 0.4$
$x_{m}$	$~ multinomial (N, 1, p)$ with $p (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})$	1_m = 0.333		$- 0.8$	$- 0.8$
		2_m = 0.333		$0.3$	$0.3$
		3_m = 0.333		$- 0.3$	$- 0.3$
$y$	$~ X β_{A} + normal (0, 2)$ for $A$
$y$	$~ X β_{B} + normal (0, 2)$ for $B$

Note: The sample is split into two equally sized groups, $A$ and $B$ .

We apply the KBO decomposition and the matching decomposition to the same generated data. For the KBO regressions, we specify the continuous variables $x_{1}$ , $x_{2}$ , and $x_{3}$ as linear, and use $x_{m} = 1$ as the reference category for the dummy-set $x_{m}$ . For the matching, we first categorize the continuous variables using deciles and then match on all observed combinations of $x_{1}$ , $x_{2}$ , $x_{3}$ , $x_{d}$ , and $x_{m}$ .¹

The estimands of interest are all decomposition components for each respective method: for KBO, the unexplained $(D_{0})$ and explained component $(D_{X})$ of equation (2), and for matching, the unexplained $(D_{0})$ and all three compositional components $(D_{X}, D_{A}, D_{B})$ of equation (3). As the default DGP in Table 1 generates data with common support for which the applied KBO specification is correct, the following estimand values can be expected for both KBO and matching:

$D_{X} = - 0.25$ . The only compositional difference between groups $A$ and $B$ pertains to $x_{1}$ , with ${\bar{x}}_{1, A} = 1$ and ${\bar{x}}_{1, B} = 0.5$ . As $β_{x_{2}, A} = β_{x_{2}, B} = 0.5$ , the $0.5$ points lower mean in $x_{1}$ for group $B$ results in an explained gap of $D_{X} = (0.5 - 1) \cdot 0.5 = - 0.25$ to the disadvantage of group $B$ . In practice, $x_{1}$ might represent labor market experience, with lower experience for group $B$ , but equal returns to labor market experience for both groups.

$D_{0} = - 0.3$ . The only difference in returns between groups $A$ and $B$ pertains to $x_{2}$ , with $β_{x_{2}, A} = 1.0$ and $β_{x_{2, B}} = 0.7$ . As ${\bar{x}}_{2, A} = {\bar{x}}_{2, B} = 1$ , the group-specific returns result in an unexplained gap $D_{0} = 1 \cdot (0.7 - 1.0) = - 0.3$ to the disadvantage of group $B$ , because the higher expected outcome in $y$ for group $A$ does not reflect a compositional advantage. An example would be equal levels of educational attainment for both groups, but lower returns to education for group $B$ .

$D = - 0.55$ , $D_{A} = D_{B} = 0$ . We do not impose further group differences in the distribution of $x_{3}$ , $x_{d}$ , or $x_{m}$ , nor in their associated $β$ parameters, so the overall gap is $D = (- 0.25) + (- 0.3) = - 0.55$ to the disadvantage of group $B$ . $D_{A}$ and $D_{B}$ should be $0$ because common support is given.

In the following, we keep the decomposition specifications of both methods constant and vary parts of the DGP to highlight how each component in each decomposition is affected when we vary the sample size, induce functional-form misspecification, and curtail common support. All simulations use Stata version 17.0 and run $N_{sim} = 100, 000$ times to ensure a Monte Carlo standard error less than $0.005$ for the estimates of all components.

Results

Table 2 and Figure 1 present the simulation results. As the specifications of both decomposition methods fit the default DGP, both methods accurately estimate the explained and unexplained part of the gap (panel 1). For the matching, the slight deviation of $D_{0}$ from the expected estimate is only due to coarsening of the continuous predictors. However, one caveat of the matching is that despite compositional components being correct in sum, a substantial part of $D_{X}$ is attributed to $D_{A}$ and $D_{B}$ in smaller samples because of the curse of dimensionality (only 14.3 percent [N = 2,000] to 53.5 percent [N = 10,000] of units are matched; see Appendix Table A2). Note that the nonparametric matching also gives greater uncertainty about the estimate of the unexplained component $D_{0}$ (Figure 1, panel 1).

Table 2.

Simulation Results

Variant of DGP		Raw	KBO		Ñopo’s Matching
Variant of DGP	$N$	$D$	$D_{0}$	$D_{X}$	$D_{0}$	$D_{X}$	$D_{A}$	$D_{B}$
Default (Table 1)	Expected	–.550	–.300	–.250	–.300	–.250	.000	.000
	2,000	–.550	–.300	–.250	–.309	–.017	–.111	–.112
	10,000	–.550	–.300	–.250	–.310	–.079	–.081	–.081
	50,000	–.550	–.300	–.250	–.310	–.215	–.013	–.012
Functional-form misspecification
$x_{1}^{2}$	Expected	–.675	–.300	–.375	–.300	–.375	.000	.000
	2,000	–.675	–.174	–.501	–.315	–.025	–.216	–.119
	10,000	–.675	–.175	–.500	–.319	–.114	–.164	–.077
	50,000	–.675	–.175	–.500	–.328	–.309	–.031	–.007
$x_{2}^{2}$	Expected	–.850	–.600	–.250	–.600	–.250	.000	.000
	2,000	–.850	–.596	–.253	–.609	–.018	–.111	–.112
	10,000	–.850	–.599	–.251	–.609	–.080	–.081	–.080
	50,000	–.850	–.600	–.250	–.610	–.215	–.013	–.012
$0.6 \cdot x_{d} \cdot x_{1}$ and $x_{d} ~ binomial (1, 0.2)$ for $B$	Expected	–.670	–.300	–.370	–.300	–.370	.000	.000
	2,000	–.670	–.210	–.460	–.312	–.023	–.196	–.139
	10,000	–.670	–.210	–.460	–.313	–.105	–.153	–.100
	50,000	–.670	–.210	–.460	–.315	–.276	–.064	–.015
Lacking common support
$p_{A} = (0, 0.5, 0.5)$ for $x_{m}$	Expected	–.817	–.300	–.517	–.300	–.250	.000	–.267
	2,000	–.816	–.468	–.348	–.310	–.022	–.110	–.374
	10,000	–.817	–.472	–.344	–.309	–.095	–.081	–.332
	50,000	–.817	–.470	–.346	–.310	–.224	–.012	–.270
$x_{3} ~ normal (1, 0.5)$ for $B$	Expected	–.750	–.300	–.450	–.300	–.250	–.200	.000
	2,000	–.749	–.300	–.450	–.308	–.022	–.277	–.142
	10,000	–.750	–.300	–.450	–.308	–.098	–.232	–.112
	50,000	–.750	–.300	–.450	–.311	–.255	–.145	–.039
Adding $x_{4} ~ normal (2, 1)$ , with $β_{A} = β_{B} = 0.2$	Expected	–.550	–.300	–.250	–.300	–.250	.000	.000
	2,000	–.551	–.301	–.250	–.310	–.001	–.117	–.123
	10,000	–.550	–.300	–.250	–.309	–.009	–.116	–.116
	50,000	–.550	–.300	–.250	–.309	–.043	–.099	–.099

Note: $N_{sim} = 100, 000$ ; Monte Carlo s.e. $\leq 0.005$ ; $D$ represents the raw gap between group $A$ and $B$ in the outcome $y$ ; $D_{0}$ represents the unexplained component; $D_{X}$ represents the explained component (for Ñopo’s matching only among matched); and $D_{A}$ and $D_{B}$ represent group-specific components due to unmatched observations. Numbers in boldface type denote changes in the expected estimates in comparison with the default DGP. See Appendix Table A2 for the share of unmatched units in the matching decomposition and the estimates for a matching decomposition on the basis of propensity scores. DGP = data-generating process; KBO = Kitagawa-Blinder-Oaxaca.

Figure 1.

Deviation of estimated unexplained component $D_{0}$ from the expected value across data-generating processes.

Functional-Form Misspecification

KBO and matching estimates begin to differ once we start to vary the DGP parameters. Our first variation in the DGP is a misspecification due to functional form assumptions: we square $x_{1}$ but keep $β_{x_{1}} = 0.5$ for both groups when generating the outcome. In the default DGP, the mean difference of $0.5$ in $x_{1}$ between groups was behind the explained component $D_{X}$ , and taking its square should also just affect $D_{X}$ (Table 2, panel 2). However, the unexplained component $D_{0}$ remains largely unaffected only in the nonparametric matching setting (Table 2, panel 2). In the KBO decomposition, assuming linearity for the quadratic relationship between $x_{1}$ and $y$ results in a mis-estimated regression coefficient $β_{x_{1}}$ , which in turn biases the relative importance of the unexplained component.

Second, we square $x_{2}$ when generating the outcome. Regarding $x_{2}$ , the groups do not differ in their composition but in the corresponding $β$ coefficients, so squaring $x_{2}$ should only affect the unexplained component. The estimates of both matching and KBO accurately reflect the change in the DGP (Table 2, panel 3). The KBO decomposition is technically misspecified, but the $β_{x_{2}}$ coefficients are equally mis-estimated for groups $A$ and $B$ as they do not differ in $x_{2}$ . Similarly, the matching decomposition estimates all components close to the expected values. Compared with the default DGP, squaring $x_{2}$ increases the empirical standard error of $D_{0}$ for the KBO estimate by about 39 percent and for the matching estimate by about 55 percent (compare Figure 1, panel 3).

In a third variation, we reduce the number of cases for which $x_{d} = 1$ in group $B$ (20 percent instead of 50 percent) and add an interaction of $0.6 \cdot x_{d} \cdot x_{1}$ that is equal for both groups. Compared with the default DGP, this variation should only affect decomposition components that reflect compositional differences, as we do not change group-specific $β$ parameters. This is true for the matching decomposition (Table 2, panel 4), which provides an estimate of $D_{0}$ close to the expected estimate. In the KBO decomposition, however, omitting the interaction term affects $D_{0}$ , because the compositional differences between groups in the interacted variables also change the group-specific regression coefficients for these variables. In the case of $β_{x_{1}}$ , the coefficient increases by the share of observations with $x_{d} = 1$ , multiplied with the coefficient of the interaction term: $β_{A, x_{1}}$ increases by $0.5 \cdot 0.6 = 0.3$ for group $A$ , and by $0.2 \cdot 0.6 = 0.12$ for group $B$ . The coefficient $β_{x_{d}}$ is similarly mis-estimated and adds to the bias in $D_{0}$ in the KBO decomposition (Figure 1).²

Lack of Common Support

Fourth, we induce a lack of common support in the multinomial variable $x_{m}$ by setting the probabilities for the three values of $x_{m}$ for group $A$ to $p_{A} (0, 0.5, 0.5)$ . Of the emerging compositional differences between groups, only one takes effect on the decomposition: the value $x_{m} = 1$ is no longer present among group $A$ . The different probabilities for $x_{m} = 2$ and $x_{m} = 3$ between $A$ and $B$ do not matter, as the opposing coefficients $β_{x_{m} = 2}$ and $β_{x_{m} = 3}$ result in a joint effect of zero in the group-specific equations ( $A$ : $0.3 \cdot 0.5 - 0.3 \cdot 0.5 = 0$ ; $B$ : $0.3 \cdot 0.333 - 0.3 \cdot 0.333 = 0$ ; compare Table 1). As a result, we should observe a change in the overall gap that is exclusively driven by the common support issue in $x_{m} = 1$ ,³ which is accurately captured in the matching decomposition’s $D_{B} = - 0.270$ (Table 2, panel 5, N = 50,000). By contrast, the unexplained component is substantially overestimated in the KBO decomposition. The KBO misestimation follows from the omission of $x_{m} = 1$ for group $A$ , so $x_{m} = 1$ can neither be the reference category nor have a regression coefficient. Without a regression coefficient ${\hat{β}}_{x_{m = 1}, A}$ , all observations with $x_{m} = 1$ in group $B$ are then treated as belonging to the reference category chosen in the regression among group $A$ (in this case $x_{m} = 2$ ).⁴ Thus, the extent and direction of misestimation is conditional on the chosen reference category and associated regression coefficients.⁵

In a fifth variation, we induce a mean difference and a lack of common support in the continuous variable $x_{3}$ by drawing from a normal distribution with mean $1$ (instead of $2$ ) and standard deviation $0.5$ (instead of $1$ ) for group $B$ . Compared with the default DGP, KBO attributes the $0.2$ increase in $D$ completely to the explained part $D_{X}$ because of the lower mean in $x_{3}$ for group $B$ (Table 2, panel 6). However, as we lack common support at the upper end of $x_{3}$ (see Appendix Figure A1), the KBO estimates rely on extrapolation: we assume that if individuals in group $B$ had higher $x_{3}$ values, they would generate the same returns (per unit) from these unobserved values as they do from the observed values. This is a strong assumption, which might not hold when we systematically lack common support. For example, if group $B$ cannot achieve higher values in $x_{3}$ because of a glass ceiling, it seems unlikely their wage returns would not change in the counterfactual situation of them overcoming the glass ceiling; and with changing returns we would see a different value for $D_{0}$ . Only the matching decomposition allows us to separate and quantify the effects unmatched individuals have on the gap between groups. In the present example, the component $D_{A}$ indicates that the unmatched have higher outcomes than the matched among group $A$ , because there are no matchable individuals in group $B$ at the upper end of $x_{3}$ ’s distribution; which accounts for $0.145$ of the overall gap (for $N = 50, 000$ ).

Finally, given a usually fixed size of samples, researchers might think about including further determinants in their decomposition, which is another facet of the curse of dimensionality. In a sixth variation of the DGP, we therefore add another independent variable $x_{4}$ (same parameters as $x_{3}$ in the default DGP), which should not change any estimate because $x_{4}$ neither differs in its distribution nor in its $β$ coefficient across groups. Both methods return the expected estimates for all components, but the additional continuous covariate substantially curtails common support in the matching decomposition due to the increase in dimensionality. For N = 2,000, only 1.5 percent of both groups are matched (see Appendix Table A2), which results in a tripled standard error of $D_{0}$ compared with the default DGP.

Propensity Scores

Using different ways of coarsening continuous variables is one way to address the curse of dimensionality in a given sample. Instead of deciles, one could use larger quantiles, categorize continuous variables manually, or use coarsened exact matching (Iacus, King, and Porro 2012). However, the common support bought with broad coarsening can lead to a potential misestimation of $D_{0}$ when groups are not balanced after matching and substantial between-group differences in the outcome remain within strata. Propensity scores provide an intermediate solution to this trade-off, as they condense group differences along a set of predictors into a summary measure. To highlight the potential of this approach, we first estimate propensity scores of belonging to group $B$ using the variables of each DGP variant as independent predictors (omitting polynomials and interaction terms), and then match on the propensity scores (coarsened to the level of percentage points) in Ñopo’s decomposition.⁶ In this approach, the unexplained component $D_{0}$ can be (less intuitively) interpreted as the outcome gap that would remain if individuals of both groups had the same probability of belonging to group $B$ .

For all except the fifth variant of the DGP, matching on propensity scores provides accurate estimates for all four decomposition components ( $D_{0}$ , $D_{X}$ , $D_{A}$ , and $D_{B}$ ; right-hand columns of Supplementary Materials Table A2). Thus, the nonparametric matching on propensity scores is robust to functional-form misspecifications in the parametric estimation of propensity scores (omitting polynomials of $x_{1}$ or $x_{2}$ or the interaction between $x_{1}$ and $x_{d}$ ; variants 1, 2, and 3 of the DGP). The propensity score method also works reasonably well in some cases of limited common support (variants 4 and 6). In the fifth variant of the DGP, however, the lack of common support in the continuous variable $x_{3}$ is not fully captured by component $D_{A}$ (Supplementary Materials Table A2, panel 5). The underestimation of $D_{A}$ stems from the fact that propensity scores compress the distribution of all predictors into the scale of probabilities. Group differences at the upper end of $x_{3}$ (Appendix Figure A1) translate into small absolute differences in the propensity score, which are partly discounted by the coarsening to percentage points. The remaining imbalance in $x_{3}$ between matched groups leads to bias in the other decomposition components, and one would have to adjust the coarsening of propensity scores to ensure balance after matching.⁷ Nevertheless, as $D_{A}$ is still substantially different from zero, the lack of common support is not completely concealed and can be investigated. Note that common support issues among interactions of variables would be detected only if these interactions were also modeled when predicting propensity scores. These caveats suggest matching on propensity scores is not a substitution but is a complement to matching on observed characteristics, as originally proposed by Ñopo (2008).

Overall, the simulation results highlight several issues that affect the relative size of explained and unexplained components in the different decomposition methods. Because of its nonparametric nature, a Ñopo decomposition via matching is less model dependent than KBO, and therefore less prone to misspecification when variables differ between compared groups (see the first and third variants of the DGP). When we lack common support, matching does not just offer a robust estimation of the unexplained component, but allows one to explore common support issues as a potential mechanism behind the observed gaps in outcomes. However, matching is sensitive with respect to the number of matching variables, their levels, and the number of observations, which also affects the uncertainty of estimates. The components that capture compositional differences—the classic explained component $D_{X}$ and the out-of-support components $D_{A}$ and $D_{B}$ —are especially likely to vary with sample size. $D_{A}$ and $D_{B}$ can be meaningful (Table 2, panels 4 and 5), but they can also just be an artifact of the curse of dimensionality (compare components across sample sizes). A decomposition that matches on propensity scores provides an intermediate approach that can serve as a robustness check for functional-form misspecification in the KBO decomposition and for the estimate uncertainty in the matching decomposition in cases of small samples or many predictors. As the simulation estimates are conditional on the parameters of each DGP chosen to illustrate specific issues, they are not necessarily a good benchmark for the results and bias to expect given real data, which will vary greatly in terms of direction, magnitude, and statistical significance.

Application to Real Data

To examine if and how the same issues occur in real-world situations, we use an example from our previous work (Sprengholz and Hamjediers 2022) and apply matching, KBO, and the intermediate approach via propensity scores to the wage gap between immigrant women (group $B$ ) and native men (group $A$ ) in Germany. Gender and nativity wage gaps are well documented, but less is known about how these dimensions intersect in the production of wage (dis)advantages for immigrant women, immigrant men, native women, and native men. Research suggests individual characteristics differ systematically and jointly by gender and nativity—notably in terms of work attachment, individual resources, and occupations—and the wage returns to these characteristics might vary by group as well. The heterogeneity in experiences calls for analyses with intersectional groups and implies limited common support. In our example, some immigrant women and native men likely have no counterpart in the respective other group with whom to compare their wages.

Sample and Specification

We use German Socio-economic Panel data (version 33.1; Goebel et al. 2019), which is a representative, annual panel of households. We restrict our sample to individuals in private households, aged 21 to 60 years, who are employed and not in education, and reside in western Germany. To boost the sample size, we use waves from 2013 to 2019 and ensure independence across observations by repeatedly selecting one observation per individual from the panel at random. We apply bootstrapping techniques to average across random draws and to estimate standard errors.⁸ We define immigrant women as women living in Germany with a foreign country of birth (first-generation immigrants; average sample size across bootstrapped draws from the panel: N = 2,905) and compare them to native men, who are German-born like their parents (average sample size across bootstrapped draws from the panel: N = 6,049; second-generation immigrants are excluded).

We compute individual hourly gross wages from inflation-adjusted monthly gross labor earnings in euros and actual working hours per week.⁹ Our predictors map labor market experience, working hours, educational attainment, and occupations as the most important factors behind wage gaps by gender and nativity, and we include age and marital status as further controls. For the matching decomposition, we coarsen the respective variables to keep the number of dimensions feasible. In our final analysis sample, we match on 1,470 strata of unique sets of observed characteristics, on average (see Appendix Table A1). Table 3 shows that matching balanced the groups’ characteristics fairly well. There is little difference between $B_{m}$ and $A^{B} m$ in the means of continuous variables from the matching set, indicating the applied coarsening is sufficiently fine grained to achieve comparability between immigrant women and native men.

Table 3.

Descriptive Table After Ñopo Matching Decomposition

Mean for	Native Men ( $A$ )			Immigrant Women ( $B$ )
Mean for	$A, u$	$A, m$	$A^{B}, m$	$B, m$	$B, u$
Hourly wage (gross)	18.94	19.86	17.08	14.79	11.11
Matching variables
Age	44.78	41.71	39.57	39.77	42.94
Married	.51	.54	.53	.53	.72
Educational attainment
Up to A-levels	.10	.06	.14	.14	.43
Vocational	.71	.50	.50	.50	.33
Tertiary	.19	.44	.37	.37	.24
Labor market experience	21.26	17.21	13.41	12.39	11.08
Occupations	Not shown	Not shown	Not shown	Not shown	Not shown
Part-time employed (<35 hours/week)	.08	.07	.28	.28	.73

Source: German Socio-economic Panel 2013 to 2019.

Note: Mean values based on 1,000 bootstrapping samples with random selection of one observation of each individual in the panel of 2013 to 2019. $u$ ( $m$ ) denotes unmatched (matched) units. $A^{B}$ refers to weighted group $A$ to match the characteristics of group $B$ .

We compare the decomposition results of Ñopo’s matching to the results of different KBO decompositions and a decomposition in which we apply Ñopo’s matching to propensity scores. All specifications use the variables of the matching set as wage predictors, but in several variations. KBO1 includes age, labor market experience, and labor market experience squared as continuous measures. KBO2 relaxes functional form assumptions by including only the coarsened categorical measures of the matching decomposition. KBO3 is a fully interacted version of KBO2, which captures potentially different returns of each characteristic across all strata (e.g., different returns to education across occupations and vice versa). KBO4 is KBO3 with common support (mathematically equivalent to Ñopo), and KBO5 is KBO1 with common support. The PS-Match approach estimates propensity scores with the specification of KBO1 as the predictor function for group membership, rounds the scores to quarter percentage points (a coarsening that ensured sufficient balance after matching), and matches on the scores in Ñopo’s decomposition.

Results

Table 4 provides the decomposition results. The value for the raw wage gap $D$ shows that immigrant women $(B)$ earn, on average, $6.61$ euros less per hour than native men $(A)$ .

Table 4.

Matching Decomposition versus KBO Decomposition

	$D$	$D_{0}$	$D_{X}$	$D_{A}$	$D_{B}$	$n_{A}$	$% m_{A}$	$n_{B}$	$% m_{B}$
Immigrant women ( $B$ ) – native men ( $A$ )
Ñopo	–6.61^a	–2.28^a	–2.79^a	.54	–2.07^a	6,049	41.65	2,905	43.67
Ñopo	[–7.23; –5.97]	[–3.38; –1.16]	[–3.93; –1.48]	[–.17; 1.20]	[–2.71; –1.43]
PS-Match +Ñopo	–6.61^a	–2.29^a	–3.78^a	.12	–.66^a,b	6,049	95.76	2,905	78.79
PS-Match +Ñopo	[–7.23; –5.97]	[–3.64; –1.17]	[–5.03; -2.38]	[–.06; .48]	[–1.07; –.29]
KBO1: continuous	–6.61^a	–2.10^a	–4.51^a,b
KBO1: continuous	[–7.23; –5.97]	[–3.13; –1.03]	[–5.51; –3.58]
KBO2: matching category	–6.61^a	–2.49^a	–4.11^a,b
KBO2: matching category	[–7.23; –5.97]	[–3.47; –1.44]	[–5.05; –3.15]
KBO3: interacted matching category	–6.61^a	–1.24	–5.37
KBO3: interacted matching category	[–7.23; –5.97]	[–7.30; 1.83]	[–8.39; .50]
KBO4: KBO3 with common support	–5.07^a,b	–2.28^a	–2.79^a
KBO4: KBO3 with common support	[–6.20; –3.96]	[–3.38; –1.16]	[–3.93; –1.48]
KBO5: KBO1 with common support	–5.07^a,b	–2.41^a	–2.66^a
KBO5: KBO1 with common support	[–6.20; –3.96]	[–3.40; –1.35]	[–3.71; –1.56]

Source: German Socio-economic Panel 2013 to 2019.

Note: Absolute differences in Euros and 95 percent confidence intervals based on 1,000 bootstrapping samples with random selection of one observation of each individual in the 2013 to 2019 panel. Matching done on age (4 categories), marital status (dummy), education (3 categories), labor market experience (3 categories), part-time employment (dummy), and occupation (39 categories). KBO1 includes age (continuous), married (dummy), education (3 categories), labor market experience (continuous), labor market experience squared, and occupation (39 categories) as regressors. KBO2 uses all the categorical matching variables as regressors. KBO3 is a fully interacted version of specification 2. KBO4 is the same as KBO3, and KBO5 the same as KBO1, but estimated on common support only. $% m_{A}$ and $% m_{B}$ are the matched share among groups $A$ and $B$ , respectively.

p < 0.05.

p < 0.05 difference to the estimate of the Ñopo decomposition.

For Ñopo’s matching, we observe limited common support, with a little over $40$ percent of both immigrant women and native men being matched. The negative value of $D_{B}$ indicates that among immigrant women, the matched earn, on average, significantly higher wages than the unmatched, which accounts for $- 2.07$ euros (31.3 percent) of the wage gap between immigrant women and native men. Among native men, the matched also have a wage advantage over the unmatched, but $D_{A} = + 0.54$ does not reach statistical significance. To judge whether the substantial lack of common support is a mere artifact of insufficient sample size or indeed systematic, we suggest two complementary analyses. First, we address the curse of dimensionality by compressing group differences across strata into propensity scores. Common support is still limited between groups when matching on propensities ( $% m_{A} = 95.8$ and $% m_{B} = 78.8$ for PS-Match +Ñopo in Table 4), and $D_{B} = 0.66$ is smaller but remains significant (even $D_{A}$ ’s confidence interval barely includes zero). Second, in the original study from which this example is taken, we check for common support in the German Microcensus, which has a much larger sample and information on the matching set, but not on wages. We find the lack of common support between immigrant women and native men, although smaller, extends to this data set. The additional checks suggest the lack of common support between immigrant women and native men is systematic and meaningful in the explanation of wage gaps as captured by $D_{A}$ and $D_{B}$ . These components are, therefore, an important complement to $D_{X}$ , which attributes $- 2.79$ euros ( $42.2$ percent) of the wage gap to compositional differences between immigrant women and native men who could be matched.

Comparing the distribution of characteristics by group and matching status (Table 3) can point to mechanisms behind the partial noncomparability of immigrant women and native men and the mechanisms behind the compositional differences that remain among the matched. For example, we observe the strongest difference between unmatched $(B, u)$ and matched $(B, m)$ immigrant women for educational attainment and working part-time. Together with labor market experience, these variables also exhibit the most pronounced differences between matched immigrant women $(B, m)$ and matched native men $(A, m)$ , which are reflected in $D_{X}$ .¹⁰ In Sprengholz and Hamjediers (2022), we explore this in much more detail; for instance, we show that both groups also differ systematically in terms of occupations. Among both groups, very specific persons stay unmatched (predominantly “cleaners and helpers” among immigrant women, and blue-collar workers among native men), which helps us understand why the unmatched earn, on average, lower wages than the matched in both groups.

Finally, after accounting for $D_{A}$ , $D_{B}$ , and $D_{X}$ , the remaining unexplained part $D_{0}$ amounts to $- 2.28$ euros ( $34.5$ percent) of the raw wage gap in the matching decomposition (Table 1; compare also the mean wages of $A^{B}, m$ and $B, m$ in Table 1). In comparison, $D_{0}$ is generally quite similar and not significantly different in any of the KBO decompositions in Table 1. Coarsening continuous variables in KBO2 does not indicate a bias due to linearity assumptions in KBO1. One might also fully interact KBO2 for the most flexible specification in KBO3, but doing so leads to strong deviations from the other results when we lack common support, because the estimates depend on the selected reference categories (see simulation results for the fourth DGP variant and note 4). Limiting the fully interacted specification to common support (KBO4) is equivalent to the Ñopo decomposition. With common support, functional form assumptions (KBO5) still make little difference in the results.

Thus, in the present case, neither functional-form decisions nor limited common support lead to KBO estimates of $D_{0}$ that differ from the matching estimates beyond sampling uncertainty. However, we lose valuable information on the wage returns of unmatched characteristics in the compared groups, because $D_{A}$ and $D_{B}$ of the matching decomposition are subsumed in KBO’s $D_{X}$ under strong extrapolation assumptions.

Discussion

Decompositions allow us to investigate whether outcome gaps between groups would remain if groups had comparable characteristics. However, such a counterfactual comparability is hard to achieve in practice. In this article, we compared the common, regression-based KBO decomposition to Ñopo’s matching decomposition and showed that both methods can come to different results under common scenarios of limited common support, functional-form misspecification, and insufficient sample size. The consistency of KBO decomposition results depended on the model specification and on common support between compared groups. Although the nonparametric matching decomposition was robust against both issues, it suffered from the curse of dimensionality in smaller samples, where lack of common support was an artifact of insufficient sample size but had no substantive meaning. Matching decompositions are thus rather agnostic about the distribution of the specified characteristics and their relationship with the outcome, but they demand a large number of observations to staff all strata. KBO decompositions, on the other hand, provide consistent estimates for samples of any size, but they require assumptions for model specification and, when we lack common support, for model-based extrapolation.

We therefore recommend starting any investigation with a matching decomposition that uses a set of the most important determinants to gauge potential common support issues. If either component $D_{A}$ or $D_{B}$ is substantial, further analyses should explore the reasons behind lacking common support and associated outcome gaps between the matched and unmatched (e.g., how matching status varies across [combinations of] characteristics or if common support issues extend to larger data sets). The main goal behind such an exploration is to assess to what extent limited common support indicates systematic noncomparability between groups, which should not be technically concealed, but instead should be understood as an important mechanism behind the gaps in observed outcomes. Finally, the matching estimates can be compared with the results of a KBO decomposition that uses the matching set as predictors. In this way, we learn to what extent common support issues or functional-form misspecification might bias KBO results. In our real-world example, the KBO estimates did not substantially differ from the matching estimates, but there is no way to know the magnitude and direction of potential misestimation beforehand. As Ñopo’s matching is very data demanding, a valuable intermediate approach is to apply Ñopo’s matching to propensity scores, which alleviates both dimensionality and misspecification concerns, but still flags the most pronounced common support issues in the data.

That said, beyond common support, insufficient sample size, and functional-form misspecification, our investigation does not speak to further problems such as omitted variable bias or selection (e.g., into employment in wage decompositions). All these potential issues need to be considered in decomposition analyses, especially when the remaining unexplained component is interpreted as an indicator of discrimination. For such an interpretation to be meaningful, one would also have to consider the “pre-outcome” discrimination behind group differences in the explanatory variables used in the decomposition.

On a more general level, the main takeaway of our methodological discussion for the social sciences is the invitation to carefully consider the broader issues of comparability and generalizability in our work. In decomposition analyses, separating the effects of differences in characteristics from the effects of differences in returns to these characteristics is only possible when a counterfactual comparability of groups can be established. Clearly, assessment of the extent to which groups are actually comparable should be the basis of any estimation of conditional gaps in outcomes, a point that extends to the estimation of treatment effects in experimental settings. For example, audit studies work with a small slice of social reality to manipulate a small set of social categories (e.g., gender and nativity in an otherwise fixed curriculum vitae). In such a setting, it is straightforward to estimate gaps in a particular outcome between groups (e.g., callbacks for job applications), but it remains unclear how these differences would look outside the fixed slice of social reality of the experiment. When we have common support issues in observational data, we deal with the same problem, which would be hidden by technically establishing comparability via some parametric specification. It is therefore our general recommendation for any (and especially intersectional) social science work to think about how plausible variation in a “treatment” really is in the specific setting at hand (Lundberg 2022). Although most systematic differences between groups are fortunately not immutable, comparing the incomparable is always a precarious stretch.

Supplemental Material

sj-pdf-1-smx-10.1177_00811750231169729 – Supplemental material for Comparing the Incomparable? Issues of Lacking Common Support, Functional-Form Misspecification, and Insufficient Sample Size in Decompositions

Supplemental material, sj-pdf-1-smx-10.1177_00811750231169729 for Comparing the Incomparable? Issues of Lacking Common Support, Functional-Form Misspecification, and Insufficient Sample Size in Decompositions by Maik Hamjediers and Maximilian Sprengholz in Sociological Methodology

Footnotes

Appendix

Correction (May 2025):

This article has been updated to correct the following: In Figure 1, the coefficients have been revised to match those reported in Table 2. Additionally, the coefficient for DB on p. 354 has been corrected from −0.256 to −0.270, and the coefficient for DA on p. 355 from 0.146 to 0.145.

ORCID iDs

Maik Hamjediers

Maximilian Sprengholz

Data Note

Information about data access and all code necessary to replicate the results is provided at .

Supplemental Material

Supplemental material for this article is available online.

Notes

Author Biographies

Maik Hamjediers is a research associate and doctoral candidate in the Department of Social Sciences at Humboldt-Universität zu Berlin. His research focuses on gender inequalities in the labor market and family and quantitative methodology. His work has been published in the European Sociological Review, Work & Occupations, and Social Sciences.

Maximilian Sprengholz is a doctoral candidate in the Department of Social Sciences at Humboldt-Universität zu Berlin and a research associate at the Berliner Institut für Empirische Integrations- und Migrationsforschung. His research focuses on social and political inequalities at the intersections of gender, nativity, and class. His work has been published in the Socio-economic Review, the European Journal of Women’s Studies, and Work & Occupations.

References

Bernardi

Fabrizio

Boertien

Diederik

. 2017. “Non-intact Families and Diverging Educational Destinies: A Decomposition Analysis for Germany, Italy, the United Kingdom and the United States.”Social Science Research 63:181–91.

Black

Dan A.

Haviland

Amelia M.

Sanders

Seth G.

Taylor

Lowell J.

2008. “Gender Wage Disparities among the Highly Educated.”Journal of Human Resources 43(3):630–59.

Blau

Francine D.

Kahn

Lawrence M.

2017. “The Gender Wage Gap: Extent, Trends, and Explanations.”Journal of Economic Literature 55(3):789–865.

Blinder

Alan S.

1973. “Wage Discrimination: Reduced Form and Structural Estimates.”Journal of Human Resources 8(4):436–55.

Bonaccolto-Töpfer

Marina

Briel

Stephanie

. 2022. “The Gender Pay Gap Revisited: Does Machine Learning Offer New Insights?”Labour Economics 78:102223.

Cotter

David A.

Hermsen

Joan M.

Ovadia

Seth

Vanneman

Reeve

. 2001. “The Glass Ceiling Effect.”Social Forces 80(2):655–81.

Crenshaw

Kimberle

. 1991. “Mapping the Margins: Intersectionality, Identity Politics, and Violence against Women of Color.”Stanford Law Review 43(6):1241–99.

Dassonneville

Ruth

Kostelka

Filip

. 2021. “The Cultural Sources of the Gender Gap in Voter Turnout.”British Journal of Political Science 51(3):1040–61.

Djurdjevic

Dragana

Radyakin

Sergiy

. 2007. “Decomposition of the Gender Wage Gap Using Matching: An Application for Switzerland.”Swiss Journal of Economics and Statistics 143(4):365–96.

10.

Fairlie

Robert W.

2002. “Private Schools and ‘Latino Flight’ from Black Schoolchildren.”Demography 39(4):655–74.

11.

Fiel

Jeremy E.

2013. “Decomposing School Resegregation: Social Closure, Racial Imbalance, and Racial Isolation.”American Sociological Review 78(5):828–48.

12.

Goebel

Jan

Grabka

Markus M.

Liebig

Stefan

Kroh

Martin

Richter

David

Schröder

Carsten

Schupp

Jürgen

. 2019. “The German Socio-economic Panel (SOEP).”Jahrbücher für Nationalökonomie und Statistik 239(2):345–60.

13.

Goraus

Karolina

Tyrowicz

Joanna

Velde

Lucas

. 2017. “Which Gender Wage Gap Estimates to Trust? A Comparative Analysis.”Review of Income and Wealth 63(1):118–46.

14.

Iacus

Stefano M.

King

Gary

Porro

Giuseppe

. 2012. “Causal Inference without Balance Checking: Coarsened Exact Matching.”Political Analysis 20(1):1–24.

15.

Jann

Ben

. 2008. “The Blinder-Oaxaca Decomposition for Linear Regression Models.”Stata Journal 8(4):453–79.

16.

Kitagawa

Evelyn M.

1955. “Components of a Difference between Two Rates.”Journal of the American Statistical Association 50(272):1168–94.

17.

Krause

Annabelle

Rinne

Ulf

Schüller

Simone

. 2015. “Kick It Like Ozil? Decomposing the Native-Migrant Education Gap.”International Migration Review 49(3):757–89.

18.

Leythienne

Denis

Pérez-Julián

Marina

. 2022. “Gender Pay Gaps in the European Union: A Statistical Analysis.”Luxembourg: Publications Office of the European Union.

19.

Lundberg

Ian

. 2022. “The Gap-Closing Estimand: A Causal Approach to Study Interventions That Close Disparities across Social Categories.”Sociological Methods & Research. doi:10.1177/00491241211055769.

20.

McCall

Leslie

. 2005. “The Complexity of Intersectionality.”Signs: Journal of Women in Culture and Society 30(3):1771–1800.

21.

Mischler

Frauke

. 2021. “Verdienstunterschiede zwischen Männern und Frauen: Eine Ursachenanalyse auf Grundlage der Verdienststrukturerhebung 2018.”WISTA—Wirtschaft und Statistik 4:110–25.

22.

Morokvasic-Müller

Mirjana

. 2014. “Integration: Gendered and Racialized Constructions of Otherness.” Pp. 165–84 in Contesting Integration, Engendering Migration: Theory and Practice, edited by Anthias

Pajnik

London: Palgrave Macmillan.

23.

Mustapha

J. A.

Fisher, Sr.

Bryan T.

Rizzo

John A.

Chen

Jie

Martinsen

Brad J.

Kotlarz

Harry

Ryan

Michael

, et al. 2017. “Explaining Racial Disparities in Amputation Rates for the Treatment of Peripheral Artery Disease (PAD) Using Decomposition Methods.”Journal of Racial and Ethnic Health Disparities 4:784–95.

24.

Nicodemo

Catia

Ramos

Raul

. 2012. “Wage Differentials between Native and Immigrant Women in Spain: Accounting for Differences in Support.”International Journal of Manpower 33(1):118–36.

25.

Ñopo

Hugo

. 2008. “Matching as a Tool to Decompose Wage Gaps.”Review of Economics and Statistics 90(2):290–99.

26.

Oaxaca

Ronald

. 1973. “Male-Female Wage Differentials in Urban Labor Markets.”International Economic Review 14(3):693–709.

27.

Sprengholz

Maximilian

Hamjediers

Maik

. 2022. “Intersections and Commonalities: Using Matching to Decompose Wage Gaps by Gender and Nativity in Germany.”Work & Occupations. doi:10.1177/07308884221141100.

28.

Strittmatter

Anthony

Wunsch

Conny

. 2021. “The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter?”arXiv. Retrieved April 9, 2023. http://arxiv.org/abs/2102.09207.

29.

Weichselbaumer

Doris

Winter-Ebmer

Rudolf

. 2005. “A Meta-analysis of the International Gender Wage Gap.”Journal of Economic Surveys 19(3):479–511.

30.

Yun

Myeong-Su

. 2005. “A Simple Solution to the Identiﬁcation Problem in Detailed Wage Decompositions.”Economic Inquiry 43(4):766–72.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.46 MB

Comparing the Incomparable? Issues of Lacking Common Support,Functional-Form Misspecification,and Insufficient Sample Size in Decompositions

Abstract

Keywords

Decomposition via KBO and Matching

KBO

Matching

Simulation

Data and Estimands

Results

Functional-Form Misspecification

Lack of Common Support

Propensity Scores

Application to Real Data

Sample and Specification

Results

Discussion

Supplemental Material

sj-pdf-1-smx-10.1177_00811750231169729 – Supplemental material for Comparing the Incomparable? Issues of Lacking Common Support, Functional-Form Misspecification, and Insufficient Sample Size in Decompositions

Footnotes

Appendix

Correction (May 2025):

ORCID iDs

Data Note

Supplemental Material

Notes

Author Biographies

References

Supplementary Material