Abstract
We consider variable selection when missing values are present in the predictor variables. We compare using complete cases with multiple imputation using backward selection (backwards stepping) and least angle regression. These are studied using a data set from a rheumatological disease (myositis). We find that the coefficients are slightly different and the estimated standard errors are smaller in the complete cases (not a surprise). This seems to be due to the fact that because the estimated residual variance is small the complete cases are more homogeneous than the full data cases.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
