The LOOP Estimator: Adjusting for Covariates in Randomized Experiments

Abstract

Background:

When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment.

Objectives:

This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data.

Results:

In this article, we propose the “leave-one-out potential outcomes” estimator. We leave out each observation and then impute that observation’s treatment and control potential outcomes using a prediction algorithm such as a random forest. In addition to allowing for automatic variable selection, this estimator is unbiased under the Neyman–Rubin model, generally performs at least as well as the unadjusted estimator, and the experimental randomization largely justifies the statistical assumptions made.

Keywords

causal inference covariate adjustment potential outcomes randomized trials

Get full access to this article

View all access options for this article.

References

Aronow

P. M.

Green

D. P.

Lee

D. K.

(2014). Sharp bounds on the variance in randomized experiments. The Annals of Statistics, 42, 850–871.

Aronow

P. M.

Middleton

J. A.

(2013). A class of unbiased estimators of the average treatment effect in randomized experiments. Journal of Causal Inference, 1, 135–154.

Athey

Imbens

(2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113, 7353–7360.

Barrera-Osorio

Bertrand

Linden

L. L.

Perez-Calle

(2011). Improving the design of conditional transfer programs: Evidence from a randomized education experiment in Colombia. American Economic Journal: Applied Economics, 3, 167–195.

Begg

Cho

Eastwood

Horton

Moher

Olkin

… Stroup

D. F.

(1996). Improving the quality of reporting of randomized controlled trials: The CONSORT statement. The Journal of the American Medical Association, 276, 637–639.

Berk

Pitkin

Brown

Buja

George

Zhao

(2013a). Covariance adjustments for the analysis of randomized field experiments. Evaluation Review, 37, 170–196.

Berk

Brown

Buja

Zhang

Zhao

(2013b). Valid post-selection inference. The Annals of Statistics, 41, 802–837.

Biau

(2012). Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063–1095.

Bloniarz

Liu

Zhang

Sekhon

J. S.

(2016). Lasso adjustments of treatment effect estimates in randomized experiments. Proceedings of the National Academy of Sciences, 113, 7383–7390.

10.

Breiman

(2001). Random forests. Machine Learning, 45, 5–32.

11.

Chernozhukov

Chetverikov

Demirer

Duflo

Hansen

Newey

Robins

(2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21, C1–C68.

12.

Efron

Stein

(1981). The jackknife estimate of variance. The Annals of Statistics, 9, 586–596.

13.

Freedman

D. A.

(2008). On regression adjustments to experimental data. Advances in Applied Mathematics, 40, 180–193.

14.

Holt

Smith

T. M. F.

(1979). Post stratification. Journal of the Royal Statistical Society, Series A, 142, 33–46.

15.

Horvitz

D. G.

Thompson

D. J.

(1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

16.

James

Witten

Hastie

Tibshirani

(2013). An introduction to statistical learning (Vol. 112, Chap. 8). New York, NY: Springer.

17.

Koch

G. G.

Amara

I. A.

Davis

G. W.

Gillings

D. B.

(1982). A review of some statistical methods for covariance analysis of categorical data. Biometrics, 38, 563–595.

18.

Koch

G. G.

Tangen

C. M.

Jung

J. W.

Amara

I. A.

(1998). Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. Statistics in Medicine, 17, 1863–1892.

19.

Lee

J. D.

Sun

D. L.

Sun

Taylor

J. E.

(2016). Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44, 907–927.

20.

Lin

(2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. The Annals of Applied Statistics, 7, 295–318.

21.

(2016). Covariate adjustment in randomization-based causal inference for 2 ^k factorial designs. Statistics & Probability Letters, 119, 11–20.

22.

Lunceford

J. K.

Davidian

(2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in medicine, 23, 2937–2960.

23.

Miratrix

L. W.

Sekhon

J. S.

(2013). Adjusting treatment effect estimates by post-stratification in randomized experiments. Journal of the Royal Statistical Society, Series B, 75, 369–396.

24.

Moore

K. L.

van der Laan

M. J.

(2009). Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation. Statistics in Medicine, 28, 39–64.

25.

Mutz

D. C.

Pemantle

Pham

(2018). The perils of balance testing in experimental design: Messy analyses of clean data. The American Statistician, 1–11. Retrieved from https://www.tandfonline.com/doi/abs/10.1080/00031305.2017.1322143

26.

Nie

Wager

(2017). Learning objectives for treatment effect estimation. arXiv preprint arXiv:1712.04912

27.

Robins

J. M.

(2000). Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association Section on Bayesian Statistical Science, 1999, 6–10.

28.

Robins

J. M.

Rotnitzky

Zhao

L. P.

(1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.

29.

Rosenbaum

P. R.

(2002). Covariance adjustment in randomized experiments and observational studies. Statistical Science, 17, 286–327.

30.

Rothe

(2018). Flexible covariate adjustments in randomized experiments. Working Paper. Retrieved from http://www.christophrothe.net/papers/fca_apr2018.pdf

31.

Rubin

D. B.

(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.

32.

Scharfstein

D. O.

Rotnitzky

Robins

J. M.

(1999). Rejoinder. Journal of the American Statistical Association, 94, 1135–1146.

33.

Schulz

K. F.

Altman

D. G.

Moher

(2010). CONSORT 2010 statement: Updated guidelines for reporting parallel group randomized trials. Annals of Internal Medicine, 152, 726–732.

34.

Spiess

(2018). Optimal estimation when researcher and social preferences are misaligned. Tech. rep. Job Market Paper. Retrieved from https://scholar.harvard.edu/files/spiess/files/alignedestimation.pdf

35.

Splawa-Neyman

Dabrowska

D. M.

Speed

T. P.

(1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5, 465–472.

36.

Steingrimsson

J. A.

Hanley

D. F.

Rosenblum

(2017). Improving precision by adjusting for prognostic baseline variables in randomized trials with binary outcomes, without regression model assumptions. Contemporary Clinical Trials, 54, 18–24.

37.

Tsiatis

A. A.

Davidian

Zhang

(2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in Medicine, 27, 4658–4677.

38.

Wager

Taylor

Tibshirani

R. J.

(2016). High-dimensional regression adjustments in randomized experiments. Proceedings of the National Academy of Sciences, 113, 12673–12678.

39.

Williams

(1961). Generating unbiased ratio and regression estimators. Biometrics, 17, 267–274.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.46 MB