Implementing blopmatching in Stata

Abstract

The blopmatching estimator for average treatment effects in observational studies is a nonparametric matching estimator proposed by Díaz, Rau, and Rivera (2015, Review of Economics and Statistics 97: 803–812). This approach uses the solutions of linear programming problems to build the weighting schemes that are used to impute the missing potential outcomes. In this article, we describe blopmatch, a new command that implements these estimators.

Keywords

st0632 blopmatch average treatment effects matching linear programming synthetic covariate

1 Introduction

In this article, we provide an introduction to the “blopmatching” estimators for average treatment effects proposed by Díaz, Rau, and Rivera (2015) and describe blopmatch, a new command that implements these estimators.

The blopmatching approach imputes the missing potential outcome to each unit as a weighted average of observed outcomes of the units with opposite treatment, where the vector of weights is the solution of a nested pair of optimization problems. The first of these optimization problems looks for the weighting schemes that build the “synthetic covariate” defined by Abadie, Diamond, and Hainmueller (2010). Geometrically, the synthetic covariate is the projection of the covariate vector of the unit that needs imputation onto the convex hull of covariate values of the units with opposite treatment.

However, although the synthetic covariate is unique, this first optimization problem often has multiple solutions because there might be several weighting schemes building the synthetic covariate. To overcome this issue of multiplicity of solutions, the “second optimization problem” proposed by Díaz, Rau, and Rivera (2015) implements a refinement criterion for choosing one of the weighting schemes that solves the first optimization problem. More precisely, it chooses the solution that uses the covariate values of units with opposite treatment that are as close as possible to the covariate value of the unit needing imputation. Thus, blopmatching selects the weighting scheme that maximizes the unit-level covariate balance and, simultaneously, uses the units with opposite treatment with closest covariate values.

Because in the blopmatching approach the counterfactual units and the weighting scheme are determined by solving an optimization problem, it does not need an arbitrary rule to fix these parameters ex ante, a crucial aspect of most of the nonparametric approaches currently available in the literature.¹ Finally, as Díaz, Rau, and Rivera (2015) point out, the two optimization problems involved in the blopmatching approach can be collapsed into a single linear programming program, which allows us to implement this estimator by using standard linear programming techniques.

This article is organized as follows. Section 2 presents the blopmatching estimator of average treatment effects where, in addition to presenting the general framework, a simple example illustrates what the blopmatching estimator does. Section 3 shows the procedure to estimate the marginal variances of the blopmatching estimator of average treatment effects. Section 4 details the specific method to solve the aforementioned linear program. Section 5 explains the blopmatch command, which implements the blopmatching estimators. Finally, section 6 applies blopmatch to the well-known National Supported Work Demonstration, data originally analyzed by LaLonde (1986) and later used by Dehejia and Wahba (1999).

2 Blopmatching estimator of average treatment effects

2.1 A motivating example

A simple example may be illustrative of what blopmatching does. Suppose that there are three units that are exposed to the control treatment (“control” units), whose observed outcomes are Y ₁ = Y ₁(0), Y ₂ = Y ₂(0), Y ₃ = Y ₃(0) and whose real valued covariates (continuous) are X ₁, X ₂, and X ₃, such that X ₁ < X ₂ < X ₃, so that the convex hull of these covariates is the closed interval [X ₁ , X ₃] ⊂ ℝ. Moreover, suppose that a unit that is exposed to the active treatment (“treated” unit) has observed outcome Y ₄ = Y ₄(1) ∊ ℝ and covariate X ₄ ∊ ℝ. Without loss of generality, we can assume that X ₄ ≠ X_j, j = 1, 2, 3. Here the goal is to estimate the treatment effect on this fourth unit, that is, Y ₄(1) − Y ₄(0), but the crucial complication is that Y ₄(0), the outcome of the treated unit under control treatment, is not observed. How does the blopmatching estimator impute the missing potential outcome Y ₄(0)?

Let Proj(X ₄) be the projection of X ₄ onto the closed interval [X ₁ , X ₃], that is, the nearest element of that interval to X ₄. That projection is the synthetic covariate as defined by Abadie, Diamond, and Hainmueller (2010). Notice that when X ₄ < X ₁, then Proj(X ₄) = X ₁; when X ₃ < X ₄, then Proj(X ₄) = X ₃; and when X ₄ ∊ [X ₁ , X ₃], then Proj(X ₄) = X ₄.

Regardless of the specific value of the projection, we notice now that the convex combinations of covariate values of control units generating Proj(X ₄) use vectors of weights $λ = {(λ_{1}, λ_{2}, λ_{3})}^{t} \in ℝ_{+}^{3}$ ³ that solve the following system of linear equations:

\begin{matrix} λ_{1} + λ_{2} + λ_{3} = 1 \\ λ_{1} X_{1} + λ_{2} X_{2} + λ_{3} X_{3} = P r o j (X_{4}) \end{matrix}

Because in general there are multiple vectors $λ = {(λ_{1}, λ_{2}, λ_{3})}^{t} \in ℝ_{+}^{3}$ solving this linear system of equations, the solution is actually a “solution set”. Observe also that this solution set is, indeed, the solution of the next optimization problem:

\begin{matrix} \min_{{(λ_{1}, λ_{2}, λ_{3})}^{t} \in ℝ_{+}^{3}} | X_{4} - \sum_{j = 1}^{3} λ_{j} X_{j} | \\ s.t. \\ λ_{1} + λ_{2} + λ_{3} = 1 \end{matrix}

Problem (2) is the first optimization problem used in the blopmatching estimator. Note that, with any weighting scheme solving this problem, we are attaining the best possible covariate balance or, alternatively, we are explaining the covariate of the treated unit through the covariates of the control units as best as possible. However, because there are more than one weighting scheme attaining the best possible covariate balance, a refinement criterion should be implemented to choose one of them. In doing this, we notice that any vector λ in that “solution set” uses covariate X_j , j = 1, 2, 3, whenever λ_j > 0. Thus, the weighted sum of distances (squared) between X ₄ and the covariates of the control units is given by

\sum_{j = 1}^{3} λ_{j} {| X_{4} - X_{j} |}^{2}

The second optimization proposed by the blopmatching aims to minimize this function among the solutions of the first optimization. Hence, using (1) we see that this problem can be posed as the next linear program in terms of weights:

\begin{matrix} \min_{{(λ_{1}, λ_{2}, λ_{3})}^{t} \in ℝ_{+}^{3}} \sum_{j = 1}^{3} λ_{j} {| X_{4} - X_{j} |}^{2} \\ s.t. \\ λ_{1} X_{1} + λ_{2} X_{2} + λ_{3} X_{3} = P r o j (X_{4}) \\ λ_{1} + λ_{2} + λ_{3} = 1 \end{matrix}

It is clear that when X ₄ < X ₁, then the solution of (3) is λ ^b = (1, 0, 0)^t, while when X ₃ < X ₄, that solution is λ ^b = (0, 0, 1)^t. However, when Proj(X ₄) = X ₄, the solution depends on whether X ₁ < X ₄ < X ₂ or X ₂ < X ₄ < X ₃; specifically, the solution of that optimization problem is given by $λ^{b} = {(λ_{1}^{b}, λ_{2}^{b}, λ_{3}^{b})}^{t} \in ℝ_{+}^{3}$ such that

{\begin{array}{l} λ_{1}^{b} = \frac{X_{4} - X_{1}}{X_{2} - X_{1}}, λ_{2}^{b} = \frac{X_{2} - X_{4}}{X_{2} - X_{1}}, λ_{3}^{b} = 0 if X_{1} < X_{4} < X_{2} \\ λ_{1}^{b} = 0, λ_{2}^{b} = \frac{X_{3} - X_{4}}{X_{3} - X_{2}}, λ_{3}^{b} = \frac{X_{4} - X_{2}}{X_{3} - X_{2}} if X_{2} < X_{4} < X_{3} \end{array}

Thus, the blopmatching estimator imputes the missing potential outcome of the treated unit as

{\hat{Y}}_{4}^{b} (0) = λ_{1}^{b} Y_{1} + λ_{2}^{b} Y_{2} + λ_{3}^{b} Y_{3} = {\begin{array}{l} Y_{1} & if X_{4} < X_{1} \\ Y_{3} & if X_{4} > X_{3} \\ (\frac{X_{4} - X_{1}}{X_{2} - X_{1}}) & Y_{1} + (\frac{X_{2} - X_{4}}{X_{2} - X_{1}}) Y_{2} & if X_{1} < X_{4} < X_{2} \\ (\frac{X_{3} - X_{4}}{X_{3} - X_{2}}) & Y_{2} + (\frac{X_{4} - X_{2}}{X_{3} - X_{2}}) Y_{3} & if X_{2} < X_{4} < X_{3} \end{array}

Lastly, the blopmatching estimator of the unit-level treatment effect that we wanted to estimate is given by $Y_{4} - {\hat{Y}}_{4}^{b} (0)$ . In the next section, we present the extension of this simple case to a more general setting when covariates are of higher dimension.

2.2 General framework

The binary program to be evaluated is defined by the collection

Ω_{N} = {(X_{i}, Y_{i}, W_{i}) \in ℝ^{k} \times ℝ \times {0, 1} : i \in {1, . . ., N}}

where N is the number of units, X _i ∊ ℝ ^k denotes the observed k-dimensional vector of covariates (or pretreatment variables) of unit i ∊ {1,…, N}, whose observed outcome is Y_i ∊ ℝ, and W_i ∊ {0, 1} indicates the treatment received by this unit (active treatment when W_i = 1, control treatment if W_i = 0). The number of treated units is N ₁ = $\sum_{i = 1}^{N} = W_{i}$ , and the number of control units is N ₀ = N − N ₁. For the sake of simplifying the exposition of the method, the next nonrestrictive conditions are assumed throughout the article.

i) N ₀ > 1 and N ₁ > 1,

ii) W_i = 0 for i ∊ {1,…, N ₀}, and W_i = 1 for i ∊ {N ₀ + 1,…, N}.

Condition ii states that the first N ₀ units of the sample are control units, while the remaining N ₁ = N − N ₀ are the treated ones. Under condition ii, the covariate vectors of the control units are X ₁ ,…, X _N ₀, whereas the covariate vectors of the treated units are X _N ₀ ₊ _j, j = 1,…, N ₁.

We are interested in estimating the average treatment effect (ATE) of the program, that is, τ = $E$ {Y_i (1) − Y_i (0)}, and the average treatment effect on the treated (ATT), that is, τ _tre = $E$ {Y_i (1)−Y_i (0)|W_i = 1}, where Y_i (0) denotes the outcome of unit i under control treatment and Y_i (1) denotes the outcome of unit i under active treatment. In this context, Y_i (0) and Y_i (1) are called “potential outcomes” (Rubin 1974) because, for each unit, we never observe both of them in the data at hand. That is, Y_i (0) is not observed if W_i = 1 and Y_i (1) is not observed if W_i = 0. Thus, the fundamental problem of causal inference is that the missing potential outcomes must be estimated (or imputed) to estimate either the ATE or the ATT.

Throughout, the Simplex of dimension n ∊ ℕ is denoted as Δ _n = {(λ ₁ ,…, λ_n )^t ∊ ℝ₊ ⁿ : λ ₁ + · · · + λ_n = 1}. Having defined the Simplex, in this article, we are interested in the estimator of Y_i (1 − W_i ), the missing potential outcome of unit i, of the form

{\hat{Y}}_{i} (1 - W_{i}) = {\begin{array}{l} \sum_{j = 1}^{N_{1}} λ_{i, j} Y_{N_{0} + j} & if i \in {1, \dots, N_{0}} \\ \sum_{j = 1}^{N_{0}} λ_{i, j} Y_{j} & if i \in {N_{0} + 1, \dots, N} \end{array}

with λ _i = (λ _i, ₁ ,…, λ _i,N ₁)^t ∊ Δ _N ₁ when W_i = 0 and λ _i = (λ _i, ₁ ,…, λ _i,N ₀)^t ∊ Δ _N ₀ when W_i = 1.

For a treated unit i ∊ {N ₀ + 1,…, N} whose Y_i (0) needs to be estimated, the blopmatching method uses the vector of weights that solves the next linear programming program, a straightforward extension of (3) to a higher dimension of the vector of covariates,

\begin{matrix} \min_{{(λ_{1}, \dots, λ_{N_{0}})}^{t} \in ℝ_{+}^{N_{0}}} \sum_{j = 1}^{N_{0}} λ_{j} {‖ X_{i} - X_{j} ‖}^{2} \\ s.t. \\ \sum_{j = 1}^{N_{0}} λ_{j} X_{j} = Proj (X_{i}) \\ \sum_{j = 1}^{N_{0}} λ_{j} = 1 \end{matrix}

where || · || denotes a given norm in the underlying vector space and Proj(X _i ) ∊ ℝ ^k is the projection of X _i onto the convex hull of covariate values of control units. It is worth recalling that the convex hull of vectors X ₁ ,…, X _N ₀ is the subset conformed by all their convex combinations. This is denoted as conv{X ₁ ,…, X _N ₀} = {λ ₁ X ₁ + · · · +λ_N ₀ X _N ₀ : (λ ₁ ,…, λ_N ₀)^t ∊ Δ _N ₀} ⊂ ℝ ^k . Note that Proj(X _i ) is the unique vector that complies with

‖ X_{i} - Proj (X_{i}) ‖ = \min_{x \in conv {x_{1}, \dots, x_{N_{0}}}} ‖ X_{i} - X ‖

Let $λ_{i}^{b} = {(λ_{i, 1}^{b}, \dots, λ_{i, N_{0}}^{b})}^{t} \in Δ_{N_{0}}$ denote the solution of (4), which under continuous covariates is unique with probability one. Then, the blopmatching estimator of the missing potential outcome of treated unit i, Y_i (0), is

{\hat{Y}}_{i}^{b} (0) = \sum_{j = 1}^{N_{0}} λ_{i, j}^{b} Y_{j}

A completely analogous procedure can be presented when the unit needing imputation belongs to the control group, that is, when i ∊ {1,…, N ₀} and Y_i (1) needs to be imputed. Specifically, for that case, Δ _N ₀, {1,…, N ₀}, and Proj(X _i ) in (4) must be replaced, respectively, by Δ _N ₁, {N ₀ + 1,…, N}, and the projection of X _i onto the convex hull of {X _N ₀ ₊₁ ,…, X _N }. Let $λ_{i}^{b} = {(λ_{i, 1}^{b}, \dots, λ_{i, N_{1}}^{b})}^{t} \in Δ_{N_{1}}$ denote the solution of the optimization problem configured for a control unit i. Then, the blopmatching estimator of its missing potential outcome, Y_i (1), is

{\hat{Y}}_{i}^{b} (1) = \sum_{j = 1}^{N_{1}} λ_{i, j}^{b} Y_{N_{0} + j}

Having imputed all the missing potential outcomes, we define the blopmatching estimators of the ATE and the ATT as follows:

Definition 1. The blopmatching estimators of τ (ATE) and τ _tre (ATT) are, respectively,

\begin{array}{l} {\hat{τ}}^{b} = \frac{1}{N} \sum_{i = 1}^{N} [W_{i} {Y_{i} - {\hat{Y}}_{i}^{b} (0)} + (1 - W_{i}) {{\hat{Y}}_{i}^{b} (1) - Y_{i}}] \\ {\hat{τ}}_{tre}^{b} = \frac{1}{N_{1}} \sum_{i = 1}^{N_{1}} {Y_{i} - {\hat{Y}}_{i}^{b} (0)} \end{array}

We highlight that, to impute the missing potential outcome of any unit, the blop-matching approach does not exclude a priori any unit with opposite treatment. We also highlight that the blopmatching approach does not need to fix an exogenous number of neighbors, a crucial tuning parameter in most nonparametric approaches (see Imbens and Wooldridge [2009]).

We note now that for a treated unit i ∊ {N ₀ + 1,…, N}, the observed outcome of control unit j ∊ {1,…, N ₀} participates in the realization of ${\hat{λ}}_{i}^{b} (0)$ whenever ${\hat{λ}}_{i, j}^{b}$ is strictly positive. Because the number of equality constrains of (4) is k + 1, it follows that a “basic optimal solution” of that problem will have, at most, k+1 strictly positive components. Consequently, the number of control units whose outcomes are used when the blopmatching estimator imputes Y_i (0) should be at most k+1. We observe, however, that these units should not be necessarily the first k + 1 nearest neighbors to unit i (in terms of covariate distances); indeed, they might not even be among the first M-nearest neighbors for some fixed integer M. For instance, let k = 1, and suppose that a treated unit with covariate X_i = 1 − ⲉ, with 0 < ⲉ < 1, has to be matched against control units with covariate values X ₁ = 0, X ₂ = 1 and X_j = 1 + 1/j, j = 3,…, N ₀. In this case, $λ_{i}^{b} = (ε, 1 - ε, 0, \dots, 0) \in Δ_{N_{0}}$ is the solution of (4), while for ⲉ small enough, the nearest neighbors to unit i are in order of closeness: X ₂ , X_N ₀ , X_N ₀ ₋ ₁ ,…, X ₃ , X ₁. Hence, the units used to impute the missing potential outcome of X_i are its nearest and farthest neighbors.

We end this section saying that, under the assumption of strong ignorability and some additional mild conditions, Díaz, Rau, and Rivera (2015) prove that the blop-matching estimators of the ATE and ATT are consistent. We recall that the assumption of strong ignorability holds if two conditions are satisfied: unconfoundedness, which means that the treatment assignment and the potential outcomes are conditional independent given covariates (this condition is also known as selection on observed covariates); and positivity, which means that the conditional probability of being treated given covariates is strictly above zero and strictly below one (this condition is also known as overlap in the propensity score). See chapter 12 in Imbens and Rubin (2015) for a detailed discussion on observational studies under strong ignorability and chapter 21 for a procedure to assess the plausibility of this assumption in practice.

3 Estimating the marginal variance

Díaz, Rau, and Rivera (2015) propose consistent estimators of the variances of ${\hat{τ}}^{b}$ and ${\hat{τ}}_{tre}^{b}$ , whose expressions use an estimator of the conditional variance σ ²(x) = $V$ (Y_i | X _i = x , W_i = w). The estimator of σ ²(x) relies on the solution of a similar optimization problem to (4). For any unit i, instead of using covariates of units belonging to the group with opposite treatment, this problem uses covariates of units in the group with the same treatment as the unit i under analysis, but of course leaving out the covariate vector of unit i. More precisely, for a treated unit i ∊ {N ₀ +1,…, N}, we denote $φ_{- i}^{b} = {(φ_{1}^{b}, \dots, φ_{i - 1}^{b}, φ_{i + 1}^{b}, \dots, φ_{N_{1}}^{b})}^{t} \in Δ_{N_{1} - 1}$ the solution of the following linear optimization problem,

\begin{matrix} \min_{{(φ_{1}, \dots, φ_{i - 1}, φ_{i + 1}, \dots, φ_{N_{1}})}^{t} \in ℝ_{+}^{N_{1} - 1}} \sum_{\begin{matrix} j = 1 \\ j \neq i \end{matrix}}^{N_{1}} φ_{j} {‖ X_{i} - X_{N_{0} + j} ‖}^{2} \\ s.t. \\ \sum_{\begin{matrix} j = 1 \\ j \neq i \end{matrix}}^{N_{1}} φ_{j} X_{N_{0} + j} = {Proj}^{⋆} (X_{i}) \\ \sum_{j = 1, j \neq i}^{N_{1}} φ_{j} = 1 \end{matrix}

where Proj ^⋆ (X _i ) is the projection of X _i onto conv{X _N ₀ ₊₁ ,…, X _i ₋ ₁, X _i ₊₁ ,…, X _N }. It can be shown (see Díaz, Rau, and Rivera [2015]) that the next expression is an asymptotically unbiased estimator of the conditional variance,

{\hat{σ}}_{i}^{2} = \frac{{Y_{i} - (\sum_{j = 1, j \neq i}^{N_{1}} φ_{j}^{b} Y_{N_{0} + j})}^{2}}{1 + {‖ φ_{- i}^{b} ‖}_{2}^{2}}

where || · ||₂ is the Euclidean norm in ℝ ^N ¹ ⁻ ¹.

For i ∊ {1,…, N ₀}, a completely analogous procedure can be implemented to obtain an estimator of the conditional variance σ ²(X _i ). Having estimated all the conditional variances, we find the following expressions to be consistent estimators (see Díaz, Rau, and Rivera [2015] for a detailed proof and the required conditions) of the marginal variances of the blopmatching estimators of the ATE and ATT,

\begin{matrix} \hat{V} [{\hat{τ}}^{b}] = \frac{1}{N^{2}} \sum_{i = 1}^{N} [{(Y_{i} - {\hat{Y}}_{i}^{b})}^{2} + {{(1 + c_{i}^{[1]})}^{2} - (1 + c_{i}^{[2]})} {\hat{σ}}_{i}^{2}] - \frac{{({\hat{τ}}^{b})}^{2}}{N} \\ \hat{V} [{\hat{τ}}_{tre}^{b}] = \frac{1}{N_{1}^{2}} \sum_{i = N_{1} + 1}^{N} {(Y_{i} - {\hat{Y}}_{i}^{b})}^{2} + \frac{1}{N_{1}^{2}} \sum_{i = 1}^{N_{0}} {{(c_{i}^{[1]})}^{2} - c_{i}^{[2]}} {\hat{σ}}_{i}^{2} - \frac{{({\hat{τ}}_{tre}^{b})}^{2}}{N_{1}} \end{matrix}

where

c_{i}^{[s]} = \sum_{j : W_{j} \neq W_{i}} {(λ_{j, i}^{b})}^{s} s = 1, 2

that is, the sum of weights, to the power of s, corresponding to unit i when i is used to impute the missing potential outcomes of units in the group with opposite treatment.

4 Implementation

This section presents our strategy to solve (4). Because of the similar nature of both (4) and (6), this procedure can be accommodated to solve the linear programs given by (6) and estimate the marginal variances of the treatment-effect estimators.

To solve (4), we first need to obtain the projection of X _i onto the convex hull of covariate values of units with opposite treatment condition (that is, with treatment equal to 1 − W_i ). For the sake of presentation, we initially suppose that i ∊ {N ₀ + 1,…, N} and i is a treated unit; for further convenience that will be clear later on in this section, we use Norm 1 (denoted as || · ||₁) to solve the projection problem given by (5). Here it is worth recalling that, for X = (x ₁ ,…, x_n )^t ∊ ℝ ⁿ , ||X||₁ = |x ₁ | + · · · + |x_n|. In view of these choices, we can see that (5) becomes a linear programming problem. Notice now that Proj(X _i ) is performed by the weighting schemes that solve the next optimization problem:

\min_{λ = {(λ_{1}, \dots, λ_{N_{0}})}^{t} \in Δ_{N_{0}}} ‖ X_{i} - (\sum_{j = 1}^{N_{0}} λ_{j} X_{j}) ‖

Let c _i = X _i − Proj(X _i ) ∊ ℝ ^k , and let λ _i ^∗ ∊ Δ _N ₀ denote a solution to (7). Then, it is clear that (c _i, λ _i ^∗)^t ∊ ℝ ^k × Δ _N ₀ is a solution to the next optimization problem:

\begin{matrix} \min_{{(c, λ)}^{t} \in ℝ^{k} \times Δ_{N_{0}}} ‖ c ‖_{1} \\ s.t. \\ c + \sum_{j = 1}^{N_{0}} λ_{j} X_{j} = X_{i} \end{matrix}

Denoting ι _k = (1,…, 1)^t ∊ ℝ ^k and $| c | = {(| c_{1} |, \dots, | c_{k} |)}^{t} \in ℝ_{+}^{k}$ , we see that it follows directly that $‖ c ‖_{1} = ι_{k}^{t} | c |$ . Moreover, it is also clear that there are a couple of vectors $α, β \in ℝ_{+}^{k}$ such that

c = | c | - 2 β and c = - | c | + 2 α

implying that c = α − β and | c | = α + β . Consequently, $‖ c ‖_{1} = ι_{k}^{t} (α + β)$ , and then

Proj (X_{i}) = X_{i} - (α_{i}^{*} - β_{i}^{*})

with $α_{i}^{*}, β_{i}^{*} \in ℝ^{k}$ as the corresponding components of a solution to the next linear programming problem,

\begin{matrix} \min_{{(α, β, λ)}^{t} \in ℝ_{+}^{k} \times ℝ_{+}^{k} \times ℝ_{+}^{N_{0}}} ι_{k}^{t} α + ι_{k}^{t} β \\ s.t. \\ [\begin{matrix} I_{k} & - I_{k} & X_{1} & \dots & X_{N_{0}} \\ 0_{k}^{t} & 0_{k}^{t} & 1 & \dots & 1 \end{matrix}] [\begin{matrix} α \\ β \\ λ \end{matrix}] = [\begin{matrix} X_{i} \\ 1 \end{matrix}] \end{matrix}

where I _k stands for the identity matrix of ℝ ^k ^× ^k and 0 _k ∊ ℝ ^k is the vector with all entries zero.

In the following, $j = 1, \dots, n, e_{j}^{n}$ denotes the j-vector of the canonical basis of ℝ ⁿ . Because (8) has k + 1 equality constraints, to implement the “revised Simplex method” (see Matoušek and Gärtner [2007]) to obtain a solution, we should “mark” k + 1 linear independent (l.i) columns of the (k + 1) × (2 k + N ₀) matrix on the left-hand side of restrictions of (8) (this matrix denoted as Γ). In doing this, by setting K = {1,…, k} and denoting X _i = (X ₁ _,i,…, X_k,i )^t, we set

B_{i} = {s \in K : X_{s, i} \geq X_{s, N_{0}}} \cup {k + s : s \in K, X_{s, i} < X_{s, N_{0}}} \cup {2 k + N_{0}}

It is clear that the (2k + N ₀)-column of matrix Γ, namely, the vector (X _N ₀, 1)^t ∊ ℝ ^k ⁺¹, is l.i from the first 2 k columns of that matrix. It is also clear that, by construction, the columns of Γ marked by $B_{i}$ \{2k+N ₀} are pairwise orthogonal. In consequence, columns of Γ marked by $B_{i}$ conform a basis of ℝ ^k ⁺¹. To show these columns give a “basic feasible solution” for (8), we show that a solution induced by them is nonnegative. In this regard, it is easy to check that such a solution is given by $\bar{α} = ({\bar{α}}_{j}) \in ℝ^{k}, \bar{β} = ({\bar{β}}_{j}) \in ℝ^{k}$ , and $\bar{λ} \in ℝ^{N_{0}}$ such that

\begin{array}{l} {\bar{α}}_{j} = + \max {X_{j, i} - X_{j, N_{0}}, 0}, j \in {1, \dots, k} \\ {\bar{β}}_{j} = - \min {X_{j, i} - X_{j, N_{0}}, 0}, j \in {1, \dots, k} \\ \bar{λ} = e_{N_{0}}^{N_{0}} \end{array}

Hence, by using the basic feasible solution above, we can solve (8) by the revised Simplex method, which gives the vector Proj(X _i ), as desired. All the commands needed for that purpose, programmed using Mata codes, are provided in addition to this article.

Having found Proj(X _i ), we can solve (4). Specifically, our commands implement the solution using either the Euclidean distance or the Mahalanobis distance (the choice is left to the researchers). In what follows, the norm is ||·||, and, without loss of generality, we present the implementation for a treated unit i ∊ {N ₀ + 1,…, N}.

We observe now that λ _i ^b = (λ^b _i, ₁ ,…, λ^b _i,N 0)^t ∊ Δ _N ₀, which solves (4), is also a solution of the next linear optimization problem:

\begin{matrix} \min_{λ \in ℝ_{+}^{N} 0} \sum_{j = 1}^{N_{0}} λ_{j} ∥ X_{j} - X_{i} ∥^{2} \\ s.t. \\ [\begin{matrix} X_{1} & \dots & X_{N_{0}} \\ 1 & \dots & 1 \end{matrix}] [\begin{matrix} λ_{1} \\ ⋮ \\ λ_{N_{0}} \end{matrix}] = [\begin{matrix} P r o j (X_{i}) \\ 1 \end{matrix}] \end{matrix}

We observe also that any solution λ _i ^∗ = (λ ^∗ _i, ₁ ,…, λ ^∗ _i,N 0)^t of (7) is a feasible point for (9), although not necessarily a basic solution. Nonetheless, it is not difficult to realize that the subset

C_{i} = {j \in {1, \dots, N_{0}} : λ_{i, j}^{b} > 0}

Identifies $# C$ ≤ k + 1 linear independent columns of the coefficient matrix on the left-hand side of restrictions of (9). If we assume that N ₀ > k, N ₁ > k, and covariates are continuous, under general conditions (see Cover and Efron [1967]), then any k + 1 columns of that matrix are l.i with probability one (these columns are in “general position”). Consequently, without loss of generality, we can assume that $C$ = k + 1, implying that the vector $λ_{i}^{*}$ is a basic feasible solution of (9). Using this vector, we can again implement the revised Simplex method to solve (9).² For N ₀ ≤ k (or N ₁ ≤ k), (9) can be converted into a linear programming problem with standard form so that the revised Simplex method can be directly used with λ _i ^∗ as a basic feasible solution. To implement the blopmatch command in Stata, we programmed all the commands involved in solving this optimization problem in Mata codes.

5 The blopmatch command

5.1 Syntax

blopmatch ( ovar omvarlist ) ( tvar ) [if] [in] [weight] [ , stat options ]

ovar is a binary, count, continuous, fractional, or nonnegative outcome of interest. omvarlist specifies the covariates in the outcome model. omvarlist may contain factor variables; see [U] 11.4.3 Factor variables.

tvar must contain integer values representing the treatment levels. Only two treatment levels are allowed.

5.2 Stored results

blopmatch stores the following in e():

5.3 Example

We illustrate the use of blopmatch to estimate the average treatment effect of the National Supported Work Demonstration, a labor training program, on postintervention earnings, using data from LaLonde (1986):

Table 1 describes all variables in this dataset.

Table 1.

Description of all variables in nsw.dta

Variable	Description
treat	Treatment indicator (1 if treated, 0 otherwise)
age	Age
education	Education
black	1 if black, 0 otherwise
hispanic	1 if Hispanic, 0 otherwise
married	1 if married, 0 otherwise
nodegree	1 if no degree, 0 otherwise
re75	Earnings in 1975 (in 1978 dollars)
re78	Earnings in 1978 (in 1978 dollars)

Now we use blopmatch to estimate the average treatment effect of treat on re78. Subject distances are measured using the Mahalanobis distances defined by covariates age, education, black, hispanic, married, nodegree, and (a standardized version of) re75.

Thus, the ATE of the National Supported Work Demonstration on postintervention earnings is 975.6 USD (of 1978). In contrast, the ATE estimated with teffects nnmatch depends on the number of neighbors. For example,

Supplemental Material

Supplemental Material, sj-zip-1-stj-10.1177_1536867X211000021 - Implementing blopmatching in Stata

Supplemental Material, sj-zip-1-stj-10.1177_1536867X211000021 for Implementing blopmatching in Stata by Juan D. Díaz, Iván Gutiérrez and Jorge Rivera in The Stata Journal

Footnotes

6 Acknowledgments

The authors gratefully acknowledge financial support from ANID PIA/APOYO AFB180003 and from FONDECYT-Chile, grant number 1130468.

7 Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

You can also install this package from GitHub by typing the following commands:

Notes

References

Abadie

Diamond

Hainmueller

2010. Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American Statistical Association 105: 493–505. https://doi.org/10.1198/jasa.2009.ap08746.

Abadie

Imbens

G. W.

2006. Large sample properties of matching estimators for average treatment effects. Econometrica 74: 235–267. https://doi.org/10.1111/j.1468-0262.2006.00655.x.

Axler

2014. Linear Algebra Done Right. 3rd ed. Cham, Switzerland: Springer.

Cover

T. M.

Efron

1967. Geometrical probability and random points on a hypersphere. Annals of Mathematical Statistics 38: 213–220. http://doi.org/10.1214/aoms/1177699073.

Dehejia

R. H.

Wahba

1999. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association 94: 1053–1062. https://doi.org/10.1080/01621459.1999.10473858.

Díaz

Rau

Rivera

2015. A matching estimator based on a bilevel optimization problem. Review of Economics and Statistics 97: 803–812. https://doi.org/10.1162/REST_a_00504.

Imbens

G. W.

Rubin

D. B.

2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge: Cambridge University Press.

Imbens

G. W.

Wooldridge

J. M.

2009. Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47: 5–86. https://doi.org/10.1257/jel.47.1.5.

LaLonde

R. J.

1986. Evaluating the econometric evaluations of training programs. American Economic Review 76: 604–620.

10.

Matoušek

Gärtner

2007. Understanding and Using Linear Programming. Verlag: Springer.

11.

Rubin

D. B.

1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66: 688–701. https://doi.org/10.1037/h0037350.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

2.57 MB

0.00 MB