In this article, I introduce the sivqr command, which estimates the coefficients of the instrumental variables quantile regression model introduced by Chernozhukov and Hansen (2005, Econometrica 73: 245–261). The sivqr command offers several advantages over the existing ivqreg and ivqreg2 commands for estimating this instrumental variables quantile regression model, which complements the alternative “triangular model” behind cqiv and the “local quantile treatment effect” model of ivqte. Computationally, sivqr implements the smoothed estimator of Kaplan and Sun (2017, Econometric Theory 33: 105–157), who show that smoothing improves both computation time and statistical accuracy. Standard errors are computed analytically or by Bayesian bootstrap; for nonindependent and identically distributed sampling, sivqr is compatible with bootstrap. I discuss syntax and the underlying methodology, and I compare sivqr with other commands in an example.
The instrumental variables quantile regression (ivqr) model of Chernozhukov and Hansen (2005) has become popular for modeling causal effects that differ across individuals (or firms, countries, etc.) because of unobserved variables (rather than interactions among observed variables), but the existing ivqr commands, ivqreg (Kwak 2010a) and ivqreg2 (Machado and Santos Silva 2018), have several limitations. First, ivqreg (implementing the estimator of Chernozhukov and Hansen [2006]) allows only a single endogenous term. Besides excluding multiple endogenous regressors, this also excludes interaction or quadratic terms involving the endogenous regressor, for example. Second, ivqreg2 (implementing the estimator of Machado and Santos Silva [2019]) imposes a location–scale model that can help if correct but that can result in inconsistency if misspecified. Machado and Santos Silva (2019, 159) write, “In these linear models, the validity of [our model] depends on assumptions that are stronger than those required by the ivqr but, when these assumptions are valid, [our estimator] has some potential advantages,” like ensuring noncrossing of structural quantile functions. Ideally, sivqr and ivqreg2 could be combined using a “model averaging” approach as in Cheng, Liao, and Shi (2019) or Liu (2019); meanwhile, simply running both may be insightful. Third, the existing commands’ computation time can be long. Fourth, ivqreg and ivqreg2 share some minor inconveniences, like not supporting syntax for factor variables and interaction terms. Fifth, ivqreg is seemingly not actively supported and has additional issues; for example, the newest version 1.0.1 (02dec2010) fixes a problem from version 1.0.0 displaying coefficient labels when using an asterisk in the regressor list (like xvar*), but it introduces an error when there is overidentification (more excluded instruments than endogenous regressors).
To address these limitations, I introduce the sivqr command. The sivqr command allows models with multiple endogenous terms, supports convenient syntax like that for factor variables and interaction terms, and computes a consistent estimator of the ivqr parameters in a reasonable amount of time. The run time in the section 4.1 example is around 25 seconds for sivqr, as opposed to a few minutes (ivqreg) or over 20 minutes (ivqreg2). Although these differences become trivial on small datasets, where everything computes quickly, they become even more important when using bootstrap for cluster–robust standard errors, which requires computing the estimator many times.
The sivqr command implements the smoothed ivqr estimator of Kaplan and Sun (2017), using familiar syntax. Syntax is similar to the ivregress command (see [r] ivregress) but additionally specifies a quantile level. Likewise, syntax is similar to the qreg command (see [r] qreg) but additionally specifies instruments for the endogenous regressors. With nonindependent and identically distributed sampling, the bootstrap prefix can be used for standard errors; see [r] bootstrap and the example in section 4.
Other Stata commands that address endogeneity in quantile models are based on different models than that of Chernozhukov and Hansen (2005). These alternative models have neither “stronger” nor “weaker” assumptions; they may be more plausible in certain settings and less plausible in others. The ivqte command (Frölich and Melly 2010) estimates the unconditional (default) or conditional (option aai) “local quantile treatment effect”, analogous to the local average treatment-effect model of Imbens and Angrist (1994). This conditional (aai) estimand and the ivqr estimand are different but closely related, as explained in section 10.5 of Melly and Wüthrich (2018) and detailed in Wüthrich (2020), both of whom compare the models more generally, too. See also Frölich and Melly (2013) for details. Practically, compared with ivqte with the aai option, the main advantage of sivqr is the ability to handle more than a single, binary endogenous regressor. The cqiv command (Chernozhukov et al. 2019, 2012) uses a control function estimator based on a triangular model (Lee 2007), also allowing for censoring (Chernozhukov, Fernández-Val, and Kowalski 2015). The estimand is the same as for ivqr (see section 2.1). Practically, compared with cqiv, the main advantages of sivqr are handling multiple or noncontinuous endogenous regressors, or both, as well as allowing simultaneity and reverse causality. For further comparison of the ivqr and triangular models, see section 9.2.5 of Chernozhukov, Hansen, and Wüthrich (2018).
Unfortunately, all these commands (including sivqr) require “strong” instruments for valid standard errors. In the future, it would be valuable to have Stata implementations of ivqr inference methods robust to weak instruments, such as those of Chernozhukov and Hansen (2008), Chernozhukov, Hansen, and Jansson (2009), Chernozhukov, Hansen, and Wüthrich (2018, sec. 9.3.3), and references therein. See also Kaplan and Liu (2021) for a “k-class” ivqr estimator that can reduce estimation bias from weak instruments, as well as simulation evidence about the usefulness of the conventional “first-stage” F statistic for assessing instrument strength in ivqr.
Section 2 discusses the methodology at a relatively intuitive level. Section 3 describes syntax and usage of sivqr. Section 4 provides examples that can be replicated with the provided do-file. Section 5 shows some of the theoretical foundations before concluding. Abbreviations are used for instrumental variables (iv), quantile regression (qr), ivqr, and two-stage least squares (2sls).
2 A gentle introduction to methodology
This section discusses methodology at a relatively nontechnical level (compared with section 5).
2.1 Parameter interpretation
First, consider interpretation of the parameters being estimated. For simplicity, imagine a single regressor x and outcome variable y, both scalars. Scalars u and v represent unobserved variables.
Usually, there is a structural model like y = β0 + β1x + v. Here β0 and β1 are unknown constants, and v is everything besides x that causally determines y. Here, β1 has a causal interpretation as some effect of x on y, but in reality we rarely believe such an effect is the same for all individuals (or firms, or schools, etc.). There are two approaches: either let differences in effects go into v and interpret β1 as some sort of average, or try to learn about the effect heterogeneity.
Consider a structural model that allows individuals to each have their own intercept and slope. Because the parameters are now individual-specific, they are not constants like β0 and β1 were, but rather random variables in a “random coefficients” model. The additive error term v from before is simply absorbed into the random intercept, so the structural model is y = b0 + b1x. That is, each individual has their own (y, x, b0, b1), but only (y, x) is observable.
Now, imagine the random coefficients can each be written as deterministic functions of a scalar unobservable u: b0 = β0(u) and b1 = β1(u). Because β0(·) and β1(·) are unrestricted, the distribution of u can be normalized to uniform over the unit interval [0, 1]. The functions β0(·) and β1(·) are unknown but deterministic; evaluated at a fixed 0 < τ < 1, β0(τ) and β1(τ) are unknown constants, just like β0 and β1 were before. The differences across individuals are driven by u. Each individual has his or her own (y, x, u), with y = β0(u) + β1(u)x, whereas functions β0(·) and β1(·) are not specific to any individual.
A special case of this random coefficient model is the usual structural model with constant (nonrandom) coefficients. Let β1(u) = β1, a constant that does not depend on u. Define v ≡ β0(u) − β0, so β0(u) = β0 + v. The function β0(·) can be interpreted as the inverse cumulative distribution function (quantile function) of v, shifted by β0; for example, with Φ(·) as the standard normal cumulative distribution function, v = Φ−1(u) has a standard normal distribution. Then
Some additional restrictions are required to learn about the structural model. Imagine further that given x, β0(u) + β1(u)x is increasing in u. This is known as a “monotonicity” assumption. It also explains why u is often called the “rank variable”: it describes how somebody’s y would rank in the population if everyone were forced to have the same x. For example, somebody with u = 0.5 would have median y, and somebody with u = 0.9 would have 90th percentile y. If everybody keeps the same “rank” regardless of the x value, then “rank invariance” holds. A weaker assumption called “rank similarity” allows the ranking to differ across x as long as the differences are exogenous.
Even if x is endogenous, ivqr can estimate β0(τ) and β1(τ) for any 0 < τ < 1 if an instrument z is available that is related to x but independent of u (Chernozhukov and Hansen 2005). The interpretation of these parameters depends partly on the rank assumption. If rank invariance holds, then β0(τ)+β1(τ)x0 is the y value that somebody with rank u = τ would have if we assign him or her to have value x = x0. Even with the weaker rank similarity assumption, this is the τ-quantile structural function of Imbens and Newey (2009, sec. 3.1): given any x = x0, it provides the τ-quantile of β0(u)+β1(u)x0 over the unconditional population distribution of u (uniform over [0, 1]), which is β0(τ) + β1(τ)x0 because of monotonicity. Similarly, β1(τ) can be interpreted as a τ-quantile treatment effect, capturing the difference in the τ-quantile of y between the counterfactual “untreated” distribution for which everyone has x = x0 and the counterfactual “treated” distribution for which everyone has x = x0 + 1.
2.2 Estimation
Chernozhukov and Hansen (2005) show how to derive moment conditions (or “estimating equations”) to characterize the parameters β0(τ) and β1(τ) given a valid instrument and the assumptions discussed above. For comparison, with v ≡ y − β0− β1x the standard iv moment conditions are
The ivqr moment conditions are
where 1{·} is the indicator function defined as 1{A} = 1 if A is true and otherwise 1{A} = 0. This is implied by a conditional quantile restriction on y − β0(τ) − β1(τ)x (given z), mirroring how the standard iv moments are implied by a conditional mean restriction on y − β0− β1x (given z).
Smoothing solves the computational difficulties inherent in (1). Unlike the standard iv moment conditions, the ivqr moment conditions cannot be solved explicitly for the parameters, nor are the sample moment conditions smooth (differentiable) functions of the parameters. This computational challenge is addressed by “smoothing” the indicator function: replacing it with a function that decreases continuously from 1 to 0 rather than discontinuously jumping from 1 to 0. The smoothed function differs from the indicator function only near the jump point; they are identical outside of a window whose width is controlled by a bandwidth (which asymptotically goes to zero). This smoothing allows the sample system of equations defining the estimator to be solved by standard numerical methods like those available in Mata. As a bonus, smoothing also improves the statistical properties of the estimator in theory and simulations; see sections 5 and 7 of Kaplan and Sun (2017).
3 The sivqr command
The sivqr command estimates the coefficients in an ivqr model, and it provides standard errors.
Syntax, options, and stored results are now shown. The by prefix is allowed (see [d] by). After running, postestimation commands like test and predict (see [r] test and [r] predict) can be used as usual.
As in ivregress, varlist1 has exogenous regressors (or control variables), varlist2 has endogenous regressors, and varlistiv has excluded instruments (exogenous variables that instrument for varlist2).
pweight s, fweight s, and iweight s are allowed; see [u] 11.1.6 weight.
3.2 Options
quantile(#) specifies the quantile level as in qreg(see [r] qreg): a real number strictly between 0 and 1 or, alternatively, a number between 1 and 100 interpreted as a percentile. For example, quantile(0.5) specifies the median, as does quantile(50). To prevent inadvertent mistakes, there is no default; this must be specified explicitly. quantile() is required.
bandwidth(#) manually specifies the desired smoothing bandwidth. Leaving it unspecified invokes a plugin bandwidth based on Kaplan and Sun (2017) as the default. If the desired bandwidth is too small to find a numerical solution, then it is increased until a solution is found. For example, specifying bandwidth(0) requests the smallest bandwidth that is computationally feasible. Generally, the default plugin bandwidth is recommended; compared with an arbitrarily chosen bandwidth, the plugin usually produces a more accurate estimator (lower mean squared error). However, for a final analysis, it is prudent to also try a smaller bandwidth like bandwidth(0) as a sensitivity check (results should be qualitatively similar, though not identical). A smaller bandwidth tends to yield an estimator with larger variance but smaller (closer to zero) bias.
level(#); see [r] Estimation options.
reps(#) specifies the number of bootstrap replications for estimating the variance–covariance matrix and standard errors. reps(0), which is the default, reports heteroskedasticity-robust analytic standard errors. Otherwise, the Bayesian bootstrap of Rubin (1981) is used; it is a valid frequentist bootstrap that also has a nonparametric Bayesian interpretation (see section 5.6).
logiterations prints each iteration of the numerical solver; see [m-5] solvenl( ), in particular solvenl_init_iter_log(). The default is not to print such information, which usually helps only debugging and troubleshooting.
noconstant; see [r] Estimation options.
seed(#) sets the random-number seed to make results replicable. The default is seed(112358). The current seed is restored at the end of execution.
nodots suppresses display of the replication dots (see [r] bootstrap).
initial(matname) sets the initial coefficient values for the numerical search, using the values stored in the matrix (row vector) named matname. By default, qreg is used to generate initial values. This option is particularly helpful when fitting the same model across many quantile levels; examples are given in sivqr_examples_init.do.
3.3 Stored results
sivqr stores the following in e():
Almost all the above are the same as for the familiar commands ivregress and qreg (or bsqreg). The only exceptions are for the smoothing bandwidth. By default, the plugin bandwidth is computed and returned in e(bwidth_req). More specifically, three plugin bandwidths are computed (see section 5.4), of which e(bwidth_req) is the minimum and e(bwidth_max) is the maximum. If the numerical solver cannot find the solution with e(bwidth_req) because it is too close to zero, then the bandwidth is increased until the numerical solver finds the solution. This feasible bandwidth is returned in e(bwidth). There is nothing wrong with these values being different; for example, specifying bandwidth(0) makes e(bwidth_req) zero and e(bwidth) the smallest possible bandwidth for which the numerical solver finds a solution.
If you choose to use the default plugin bandwidth and if e(bwidth) is greater than e(bwidth_max), then there may be deeper problems such as weak instruments. Although there is currently no weak instrument test specific to ivqr, it would be helpful to run a weak instrument test for 2sls in this case, such as running ivregress and then estat firststage. If e(bwidth) is above e(bwidth_req) but less than e(bwidth_max), then it may be simply that the plugin bandwidth was chosen too conservatively (see section 5.4), but it may still be worth running ivregress and estat firststage.
These are the standard values stored by ereturn display; see [P] ereturn.
4 Example: Example 4 from ivregress
All results (besides run times) can be replicated with the provided sivqr_examples.do file. I ran them in Stata/MP 17.0 with 2 cores on Windows 10, on a few-yearsold standard-issue Dell desktop computer (Intel i5-8500 3GHz processor, 8gb ram). Besides using version 1.1.0 (22jan2022) of sivqr, I used version 2.6 (29sep2021) of ivqreg2 from Statistical Software Components Archive (Machado and Santos Silva 2018) and version 1.0.0 (28jul2010) of ivqreg (Kwak 2010a). The most recent version 1.0.1 (02dec2010) from Kwak (2010b) unfortunately gives an error when trying to use an overidentified model (as in section 4.1), “st_numscalar(): 3204 matrix found where scalar required.” Other changes in version 1.0.1 are both positive and negative; when I modify the section 4.1 model to have only union as an excluded instrument, then version 1.0.1 reports standard errors, whereas version 1.0.0 does not (instead reporting “Warning: variance matrix is nonsymmetric or highly singular”), although the version 1.0.1 run time is over 10 times longer.
The primary focus is comparison with ivqreg and ivqreg2, in terms of estimates, speed, and capabilities, including the following:
sivqr with a small bandwidth produces estimates very similar to ivqreg;
ivqreg2 can differ from ivqreg and sivqr, though is qualitatively similar;
ivqreg2 can be significantly slower than sivqr.
As noted earlier, ivqreg does not allow multiple endogenous regressors, which precludes even a quadratic function of the endogenous regressor, and the factorial operator ## is supported only by sivqr.
4.1 Main comparison
The following revisits example 4 in [r] ivregress. Briefly, it hopes to fit a structural wage model, treating job tenure as endogenous; see [r] ivregress for details. I focus on the coefficient on job tenure.
For comparison, ivregress 2sls is run first. This provides a helpful “sanity check”: even if there are differences across quantile levels, the ivqr coefficient estimates and signs should still be similar overall, especially around the median.
The following shows results from sivqr at the median level. To cluster standard errors at the individual level, we use the bootstrap prefix after using xtset appropriately.1 With bootstrap, it would only waste time (without changing the reported estimates or standard errors) to specify the reps() option (with a nonzero number) for sivqr. At the median, the sivqr coefficient estimates (other than the intercept) as well as the corresponding cluster–robust standard errors are similar to those from ivregress 2sls.
The following code produces the results in tables 1 and 2, besides the cluster– bootstrap standard errors computed above (that are repeated for other quantiles). Compared with the provided do-file, below I omit various timer and matrix storage commands (and output). I run sivqr with both the default plugin bandwidth as well as bandwidth(0), which tries to find the smallest computationally feasible bandwidth. The squared age variable is generated manually because ivqreg and ivqreg2 do not support the factorial operator ##.
Estimates of tenure coefficient; clustered standard errors
Quantile
sivqr
sivqr, b(0)
ivqreg
ivqreg2
0.25
0.0865756 (0.0031621)
0.0860257
0.0861186
0.0933732
0.50
0.1076941 (0.0046079)
0.1080343
0.1032661
0.1084423
0.75
0.1565857 (0.0103318)
0.1553029
0.1584188
0.1250514
note: 2SLS estimate (std. err.) is 0.1061 (0.0045).
Runtime (seconds) of ivqr commands on nlswork data
Quantile
bootstrap:sivqr
sivqr
sivqr, b(0)
ivqreg
ivqreg2
0.25
861
16
98
130
↑
0.50
500
5
68
68
1304
0.75
1026
4
47
153
↓
Table 1 shows the different tenure coefficient estimates at different quantiles, with the 2sls results for comparison. To help gauge the magnitude of the differences, table 1 also shows cluster–robust standard errors from 2sls and from using bootstrap with sivqr. (None of the commands produces cluster–robust standard errors by itself, so the default standard errors are not reported; they are generally too small here, as expected.)
At the median, all estimates are similar. All are well within one standard error of the 2sls estimate. (The 2sls estimate can differ greatly from median ivqr estimates, just as the sample mean and median can differ greatly, but they are often similar.) The cluster–bootstrap sivqr standard error is also similar to the 2sls cluster–robust standard error.
Away from the median, sivqr (either bandwidth) and ivqreg remain similar to each other, but other differences arise. First, the ivqr estimates are all increasing with the quantile level, differing significantly from the 2sls estimate both economically and statistically. Compared with the 0.25 quantile estimates, most of the 0.75 quantile estimates are nearly twice as large, suggesting substantial heterogeneity. This could be interpreted as individuals with higher levels of “ability” (and other unobserved wage determinants) having higher returns to job tenure.
Second, ivqreg2 differs from the other ivqr estimators. This is especially seen at the 0.75 quantile: the ivqreg2 estimate of the return to tenure is below the other three estimates by around 0.03 (three percentage points), which is around a three standarderror difference. It is possible that the location–scale model of ivqreg2 is correctly specified and that this difference is only due to larger estimation error of sivqr and ivqreg, but given the consistency of both sivqr and ivqreg along with the difference being three standard errors, it seems likely that at least part of the difference is explained by misspecification of the ivqreg2 location–scale model.
As a side note, these ivqreg2 results were using the regressor order that gave the most economically plausible estimates. If we keep everything the same but with tenure last in the regressor list, the estimates of the tenure coefficient become significantly negative, and two warning messages appear: “WARNING: .03758389% of the fittedvalues of the scale function are not positive” and “Warning: variance matrix is nonsymmetric or highly singular” (with missing standard errors).
Table 2 shows the run time of the different commands at different quantiles. The fastest is sivqr with the default plugin bandwidth and analytic standard errors, taking around 25 seconds to compute at all 3 quantiles. Using bandwidth(0) is significantly slower (a few minutes total, instead of seconds) because it requires many more calls to the numerical solver, trying to find the very smallest usable bandwidth. The ivqreg time is even slower than bandwidth(0), also taking multiple minutes total. Naturally, running a cluster bootstrap with 100 replications takes even longer, even using the fast sivqr command with plugin bandwidth, taking around 40 minutes total. Even without bootstrap, ivqreg2 takes over 20 minutes, an order of magnitude more than the 25 seconds taken by sivqr.
4.2 More computation times
The provided do-file illustrates some additional important considerations about computation time. Most of the sivqr computation time is from calling the solvenl() numerical solver. Naturally, some numerical problems are more difficult to solve quickly. In particular, difficulty is higher with less smoothing or when there is less data near the quantile of interest (roughly speaking), which most often happens farther into the tails at quantile levels near zero or one.
Table 3 illustrates this by running sivqr over quantiles 0.10, 0.15,…, 0.90, using either the plugin bandwidth (always between 0.05 and 0.08) or bandwidth(0) (always between 0.00001 and 0.00012, except at the 0.9 quantile). First, although there are other factors, run time is generally longer in the tails: with the plugin bandwidth, run time is at least 8 seconds at quantile levels {0.10, 0.80, 0.85, 0.90} but under 8 seconds in between. The pattern is less clear for bandwidth(0) because it depends not only on the run time per solvenl() call but also on the number of such calls. Second, the run time is longer with the smaller bandwidth at 16 out of 17 quantile levels.
Runtime (seconds) across quantile levels, nlswork data
Quantile
plug-in
bandwidth(0)
0.10
16
99
0.15
3
51
0.20
6
49
0.25
5
88
0.30
2
82
0.35
2
54
0.40
3
46
0.45
4
25
0.50
4
61
0.55
4
29
0.60
3
27
0.65
2
29
0.70
5
38
0.75
4
44
0.80
13
36
0.85
53
44
0.90
8
43
Also, even with the same original dataset, some bootstrap datasets are naturally more difficult than others, so bootstrap run time can depend on the random seed. With quantile(0.5), out of 100 runs with different seeds, run times for the built-in sivqr bootstrap with 20 replications ranged from 9 seconds to 30 seconds, with lower and upper quartiles 11 seconds and 16 seconds. Among 5 runs with the bootstrap prefix (also 20 replications), run times ranged from 79 to 90 seconds. (I conjecture this is much slower because it resamples a new dataset rather than merely reweighting the original dataset like the Bayesian bootstrap of sivqr, but I have not broken down the run time by function to investigate further.) However, it cannot be known beforehand which seeds are “faster” for a particular dataset, so using the default seed is recommended, which also avoids any appearance of trying to manipulate results.
5 Methods and formulas
This section contains additional theoretical details. After introducing notation in section 5.1, section 5.2 concerns the difficulty of estimation and the solution of Kaplan and Sun (2017), while section 5.3 has some of the identification arguments from Chernozhukov and Hansen (2005) that characterize causal parameters as solutions to moment conditions (estimating equations). Section 5.4 describes various bandwidths. Section 5.5 shows how to compute analytic standard errors. Section 5.6 has brief notes on the Bayesian bootstrap used to compute standard errors.
These details may provide a deeper understanding for some readers, but they may also be skipped without hindering successful application of sivqr in practice.
5.1 Notation
Notationally, let y be the scalar outcome variable, x a column vector of regressors, and u a scalar unobserved variable. Let d be a subset of x containing all the endogenous regressors. Let z be a column vector of all exogenous variables: both exogenous regressors in x as well as the excluded instruments (iv). The scalar random variable u is unobserved. Following Stata Journal convention, vectors are lowercase bold (like x) and scalars are lowercase plain (like y) because uppercase is reserved for matrices, but with some luck, random variables can be distinguished from their possible nonrandom realizations.
Additional notation is introduced as needed below.
5.2 Estimation
As a helpful reference point for intuition, recall the standard (“mean”) iv model and estimator. The goal is to estimate the nonrandom parameter vector β in the structural model
If the instruments z are independent of the structural error u, then
where the ⇒ follows from the law of iterated expectations. If the dimensions of z and x are the same (“exact identification”), then this can be solved explicitly for β,
and (with independent and identically distributed [i.i.d.] data) the expectations are replaced by sample averages to yield the familiar iv estimator.
Superficially, imagine replacing the conditional mean restriction E(u |z) in (2) with a conditional τ-quantile restriction for some 0 < τ < 1:
That is, conditional on any value of z, the conditional distribution of u has its τ-quantile equal to 0. By definition, the τ-quantile of a distribution is the value with τ probability below that value, and the same is true conditional on z. Thus, (3) is equivalent to
Let 1{·} be the indicator function such that 1{A} = 1 if A is true and otherwise 1{A} = 0. Rewriting Pr(·) as E[1{·}] and applying the law of iterated expectations as in (2), we obtain
Despite some similarities, computing an ivqr estimator based on (4) is much more difficult than computing the mean iv estimator based on (2). With mean iv, it was possible to solve for β and replace the mean (expectation) with the sample average. Here the β is stuck inside the indicator function 1{·}, so it cannot be solved for explicitly. Further, it may be impossible to solve the equation exactly after substituting in the sample average.
These difficulties are both addressed by “smoothing” the indicator function in (4). Replacing 1{·} with a very similar but continuously differentiable function allows the sample system of equations to be solved quickly by standard numerical methods. Specifically, replacing
is a function of v that smoothly decreases from 1 to 0 over −1 ≤ v ≤ 1 instead of decreasing discontinuously from 1 to 0 at v = 0.2 Adding a bandwidth h allows to decrease more steeply over −h ≤ v ≤ h. The estimator
solves the “smoothed estimating equations” (or “smoothed moment conditions”)
As a bonus, this smoothing also improves estimation precision, as shown both theoretically and in simulations by Kaplan and Sun (2017, sec. 5 and 7). Weights wi can be inserted into (5) as needed:
Specifically, sivqr uses a piecewise linear that connects the smoothed ivqr estimator to other estimators. Instead of jumping down from 1 to 0 discontinuously,
has
for v ≤ −1 and
for v ≥ 1, but transitions linearly with
for −1 < v < 1. Kaplan and Sun (2017, sec. 2.2, 111) show how this choice produces the Winsorized mean estimator of the type in Huber (1964, ex. (iii), 79) in the special case with τ = 0.5 and an intercept-only model (x = z = 1). They also show (in section 2.2) how using a very large amount of smoothing (h) turns the smoothed ivqr estimator into the usual (mean) iv estimator, with an adjusted intercept. Intuitively, if h is large enough that every observation is smoothed, then
is a linear function of the residuals , just as in the mean iv moment conditions; if τ ≠ 0.5, then the intercept is different, but the slope estimates are identical to the iv slope estimates for any τ. Of course, in practice h is small, but this shows that the worst-case effect of choosing h too large is that you simply get the usual iv slope estimates.
If there are more instruments than parameters (overidentification), then a different z is used in (5). Specifically, it is replaced by the linear projection of the regressor vector x onto the instruments z, which has the same dimension as x and thus β. This linear projection is motivated by the 2sls estimator, which can also be written as the mean iv estimator when replacing z with the linear projection of x onto z. Although theoretically more efficient estimators may exist, this estimator remains consistent, reliable, and fast to compute regardless of the degree of overidentification.
5.3 Identification
It remains to motivate the conditional quantile restriction in (3) from a causal model, as originally done by Chernozhukov and Hansen (2005).3 Instead of assuming every individual (or firm, or school, etc.) has the same coefficients β, imagine each individual has his or her own coefficient vector b. This is a “random coefficient” model, meaning b can differ by individual the same way that y, x, and z do:
An additional error term would be redundant; for example, if y = b0 + b1x + v, then the random intercept can simply absorb v to become b0 + v. To make the model more tractable empirically, imagine b can be written as a deterministic (but unknown) vectorvalued function β(·) applied to a scalar unobserved u:
Assuming u is continuous, it can be normalized to have a Unif(0, 1) distribution (uniformly distributed between 0 and 1): any transformation is simply absorbed into β(·). Assume a “monotonicity” property that given any value of x, x′β(u) is an increasing function of u, and like before assume the instruments are independent of u. Then,
That is, for any u = τ, β(τ) is identified by the conditional quantile restriction from (3), which can be estimated by the sivqr command using (5). Given τ, the function qτ (x) = x′β(τ) is also the τ-quantile structural function introduced by Imbens and Newey (2009, sec. 3.1).
5.4 Plugin bandwidth
This section applies only to the plugin bandwidth. If instead the bandwidth() option is manually specified, then none of the following applies.
Kaplan and Sun (2017, prop. 2, 117) provide a theoretical “optimal” smoothing bandwidth. Specifically, it minimizes the asymptotic mean squared error of the smoothed estimating equations (moment conditions) themselves. It also minimizes the asymptotic mean squared error of a particular linear combination of the estimated coefficients, although it does not do so for every possible linear combination; see section 5 of Kaplan and Sun (2017).
A plugin (data-dependent) version of the optimal bandwidth from Kaplan and Sun (2017) is implemented in sivqr as follows. First, as in their proposition 2, the bandwidth is simplified by assuming v ≡ y−x′β(τ) is independent of the full vector of instruments. Second, the smoothed indicator function is piecewise linear, as mentioned in section 5.2. In the notation of Kaplan and Sun (2017),
Third, the smallest amount of underlying smoothness in the data-generating process is assumed, r = 2 (see Kaplan and Sun [2017, ass. 3]).
With the chosen G(·) in (8) and r = 2, simplifying the optimal bandwidth h∗ from proposition 2 of Kaplan and Sun (2017) with v ⊥ z,
where fv(·) is the probability density function of v and
its first derivative and d is the dimension (length) of β(τ).
The sivqr plugin bandwidth is the minimum of a few different values. If one takes the minimum (instead of average or maximum), mistakes tend to be in the direction of undersmoothing compared with the “optimal” smoothing that minimizes mean squared error. This results in lower bias (but higher variance) than is optimal, because smoothing reduces variance at the cost of increased bias. The minimum is taken among a nonparametrically estimated version of h∗, a Gaussian reference version, and the Silverman rule-of-thumb bandwidth (Silverman 1986, sec. 3.4.2) for estimating the density of v, as suggested by Fernandes, Guerre, and Horta (2021) for non-iv smoothed qr. The first two are detailed further below.
After one finally uses the plugin bandwidth to compute the smoothed ivqr estimator, the residuals
are then recomputed from the new
, leading to an updated plugin bandwidth, and the resulting ivqr estimate is reported by sivqr.
5.4.1 Nonparametrically estimated bandwidth
One approach is to replace fv(0) and in h∗ by nonparametric kernel estimates. Because v is unobserved, an initial estimate
must be used to construct residuals,
For f(0) (dropping the v subscript for now), given the
, the usual kernel density estimator is
where s is the kernel bandwidth (not to be confused with the ivqr smoothing bandwidth h) and the kernel function K(·) is chosen to be the Gaussian kernel. For s, a pointwise version of Silverman’s rule of thumb (Silverman 1986, sec. 3.4.2) is used because interest is in f(0), not the full function f(·), so it is better to minimize the pointwise mean squared error than the integrated mean squared error. The (asymptotic) pointwise mean squared error and optimal bandwidth can be found in DasGupta (2008, ch. 32, 536), for example:
Following the Gaussian reference approach of Silverman (1986, sec. 3.4.2), assume f(·) is the density of a N(µ, σ2) distribution; that is, f(x) = φ{(x − µ)/σ}/σ, and the corresponding cumulative distribution function is F (x) = Φ{(x − µ)/σ}, where φ(·) and Φ(·) are respectively the density and distribution functions of the standard normal distribution. Given the restriction that the τ-quantile of v is 0,
Thus, the density at 0 is
Similarly, assuming normality allows us to express in terms of . The second derivative of the N(µ, σ2) density evaluated at 0 is4
Finally, plugging everything into the formula for s∗, we see
Following Silverman [1986, (3.30)] to get a feasible bandwidth , the unknown σ is replaced by either the sample standard deviation of the or their sample interquartile range divided by 1.349 (the standard normal interquartile range), whichever is smaller. This is then used to compute using (9).
For the density derivative [here simply f′(0)], the approach is the same: use a nonparametric kernel estimator with a Silverman-type bandwidth. The usual estimator is
where now b is the bandwidth (to avoid confusion with h and s). The asymptotic mean squared error for this estimator at the point 0 is from Wand and Jones (1995),6
where as in (11). Solving the first-order condition (setting the derivative with respect to b equal to 0) leads to the optimal bandwidth
Plugging in the standard normal density for K(·) yields
Again, to get a feasible bandwidth , one replaces σ with the sample standard deviation or interquartile range divided by 1.349, whichever is smaller. This is then used to compute using (12).
Finally, and are plugged into the expression for h∗ to get the feasible plugin bandwidth .
5.4.2 Gaussian reference bandwidth
Here the Gaussian reference approach is used directly for h∗ to simplify the density and density derivative of v.
From (10), assuming normality of v yields f(0) = σ−1φ{Φ−1(τ)}.
Assuming normality, the first derivative of the density of v evaluated at zero is8
Plugging this into the expression for h∗ yields
and as usual σ is replaced by the smaller of the sample standard deviation of the or the sample interquartile range of the divided by 1.349.
5.5 Standard errors
By default or with reps(0), analytic standard errors are computed as follows; otherwise, bootstrap is used as in section 5.6. The formulas here are special cases of (3.11) of Chernozhukov and Hansen (2006); (18), (23), and theorem 5 of de Castro et al. (2019); or theorem 7 of Kaplan and Sun (2017). For notational simplicity, I leave the dependence on τ implicit. Define an additive error term
The first-order asymptotic distribution of the sivqr estimator is
where is the conditional probability density functions of ε given (z, x) evaluated at ε = 0. The expression J−1S(J′)−1 is mathematically equivalent but requires inverting the asymmetric matrix J by itself. The estimated asymptotic covariance matrix is
using a conventional kernel density estimator for with a Gaussian kernel φ(·) and bandwidth h being Silverman’s rule of thumb (Silverman 1986, sec. 3.4.2), specifically 1.06n−1/5 times the minimum of either the sample standard deviation of the or their sample interquartile range divided by 1.349. Alternatively, the bandwidth in (A.8) of Liu (2019) would also work well. Note is a Powell (1986)-type estimator, similar to (3.14) of Chernozhukov and Hansen (2006) but here with a Gaussian instead of uniform kernel. The reported standard errors are then the square roots of the diagonals of the matrix .
5.6 Bayesian bootstrap
When reps(#) specifies multiple bootstrap replications, standard errors are computed by Bayesian bootstrap (Rubin 1981), assuming i.i.d. sampling. (With non-i.i.d. sampling, bootstrap can be used; see [r] bootstrap.) This is a particular type of frequentist exchangeable bootstrap (for example, van der Vaart and Wellner [1996, ex. 3.6.9, 354]) that also has a nonparametric Bayesian interpretation (for example, Chamberlain and Imbens [2003] advocate for Bayesian bootstrap and show iv and qr applications). In each replication, weights are set as where the ξi are i.i.d. standard exponential random variables and is their average; equivalently, the vector (w1,…, wn)/n follows a Dirichlet distribution with every parameter equal to 1. Using these wi, one computes the estimator by solving (6). This is repeated many times; the reported standard error is the standard deviation of the different .
6 Conclusion
The sivqr command offers smoothed estimation of the ivqr model of Chernozhukov and Hansen (2005). The smoothing improves computation and accuracy, helping sivqr to overcome several limitations of ivqreg and ivqreg2, while complementing commands for alternative quantile models with endogeneity (cqiv and ivqte). The sivqr command can help Stata users reliably estimate heterogeneous effects across a variety of settings and datasets.
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221106404 for Smoothed instrumental variables quantile regression by David M. Kaplan in The Stata Journal
Footnotes
7 Acknowledgments
There would be no sivqr command without the work of Yixiao Sun on . Thanks to Xin Liu for research assistance. Thanks to Joao Santos Silva for suggesting the initial() option. Thanks to Di Liu for several helpful suggestions and encouragement, including e(bwidth_max) and analytic standard errors. Many thanks to a particularly thorough anonymous reviewer who found my mistakes and helped improve both the content and presentation tremendously.
8 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
ChamberlainG.ImbensG. W.2003. Nonparametric applications of Bayesian inference. Journal of Business and Economic Statistics21: 12–18. https://doi.org/10.1198/073500102288618711.
3.
ChengX.LiaoZ.ShiR.2019. On uniform asymptotic risk of averaging gmm estimators. Quantitative Economics10: 931–979. https://doi.org/10.3982/QE711.
4.
ChernozhukovV.Fernández-ValI.HanS.KowalskiA.2012. cqiv: Stata module to perform censored quantile instrumental variables regression. Statistical Software Components S457478, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s457478.html.
5.
ChernozhukovV.Fernández-ValI.HanS.KowalskiA.2019. Censored quantile instrumental-variable estimation with Stata. Stata Journal19: 768–781. https://doi.org/10.1177/1536867X19893615.
6.
ChernozhukovV.Fernández-ValI.KowalskiA. E.2015. Quantile regression with censoring and endogeneity. Journal of Econometrics186: 201–221. https://doi.org/10.1016/j.jeconom.2014.06.017.
FrölichM.MellyB.2013. Unconditional quantile treatment effects under endogeneity. Journal of Business and Economic Statistics31: 346–357. https://doi.org/10.1080/07350015.2013.803869.
16.
HuberP. J.1964. Robust estimation of a location parameter. Annals of Mathematical Statistics35: 73–101.
17.
ImbensG. W.AngristJ. D.1994. Identification and estimation of local average treatment effects. Econometrica62: 467–475. https://doi.org/10.2307/2951620.
18.
ImbensG. W.NeweyW. K.2009. Identification and estimation of triangular simultaneous equations models without additivity. Econometrica77: 1481–1512. https://doi.org/10.3982/ECTA7108.
LiuX.2019Averaging estimation for instrumental variables quantile regression. Working Paper 1907, Department of Economics, University of Missouri. https://ideas.repec.org/p/umc/wpaper/1907.html.
26.
MachadoJ. A. F.Santos SilvaJ. M. C.2018. ivqreg2: Stata module to provide structural quantile function estimation. Statistical Software Components S458571, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s458571.html.
MellyB.WüthrichK.2018. Local quantile treatment effects. In Handbook of Quantile Regression, ed. KoenkerR.ChernozhukovV.HeX.PengL., 145–164. Handbooks of Modern Statistical Methods, Boca Raton, fl: Chapman & Hall/CRC. https://doi.org/10.1201/9781315120256.
SilvermanB. W.1986. Density Estimation for Statistics and Data Analysis. Boca Raton, FL: Chapman & Hall/CRC.
32.
van der VaartA. W.WellnerJ. A.1996. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer. https://doi.org/10.1007/978-1-4757-2545-2.
33.
WandM. P.JonesM. C.1995. Kernel Smoothing. New York: Chapman & Hall/CRC.
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.