Abstract
Multiple imputation techniques are commonly used when data are missing, however, there are many options one can consider. Multivariate imputation by chained equations is a popular method for generating imputations but relies on specifying models when imputing missing values. In this work, we introduce multiple imputation by super learning, an update to the multivariate imputation by chained equations method to generate imputations with ensemble learning. Ensemble methodologies have recently gained attention for use in inference and prediction as they optimally combine a variety of user-specified parametric and non-parametric models and perform well when estimating complex functions, including those with interaction terms. Through two simulations we compare inferences made using the multiple imputation by super learning approach to those made with other commonly used multiple imputation methods and demonstrate multiple imputation by super learning as a superior option when considering characteristics such as bias, confidence interval coverage rate, and confidence interval width.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
