Abstract
Background/Aims:
Multiple imputation is often recommended over complete case analysis for handling missing data in clinical trials due to its ability to recover information from participants with incomplete data. While multiple imputation is generally held to be more efficient than complete case analysis, its planned use in clinical trials is typically not considered during sample size estimation. The standard approach of inflating the sample size for anticipated loss to follow-up is applicable for complete case analysis but could lead to excess power and hence inefficient resource use should multiple imputation be planned for analysis. In this article, we systematically reviewed published clinical trials with the aim of quantifying the precision advantages of multiple imputation over complete case analysis in treatment effect estimation, hence informing sample size planning for future trials.
Methods:
We conducted a targeted review of clinical trials published between January 2019 and December 2023 in Lancet, The BMJ, Journal of the American Medical Association and New England Journal of Medicine. Clinical trials were eligible for inclusion if point and variance estimates for the effect of treatment on a primary efficacy or safety outcome could be determined for both multiple imputation and complete case analysis. The design effect due to multiple imputation was calculated as the variance of the treatment effect estimate using multiple imputation divided by the corresponding variance using complete case analysis. As a supplementary analysis, we also conducted an untargeted review of other journals by searching in PubMed for clinical trials with the keywords ‘imputation’ or ‘imputed’ in their title or abstract.
Results:
The targeted search identified 547 articles, of which 59 satisfied eligibility criteria. Included trials tended to be large in size (median 653 participants) and reported a median of 8.6% missing data in the complete case analysis of the primary outcome (range 0.4%–30.5%). Multiple imputation was most frequently applied using chained equations under a missing at random assumption, with auxiliary variables included in the imputation model in most trials. The median design effect due to multiple imputation was 1.00 in both unadjusted (n = 15 trials) and covariate-adjusted analyses (n = 46 trials), suggesting multiple imputation typically was not offering precision advantages over complete case analysis. Similar design effects were observed in the untargeted review (median 0.96 and 1.01 for unadjusted and covariate-adjusted analyses), despite higher rates of missing data overall (median 15.7%, n = 49 trials).
Discussion:
Multiple imputation did not consistently lead to more precise treatment effect estimates than complete case analysis in the trials included in the review. Findings should not be construed as an argument against the use of multiple imputation but suggest the standard approach of inflating the sample size for anticipated loss to follow-up is reasonable when multiple imputation is planned for analysis.
Get full access to this article
View all access options for this article.
