Abstract
Background/Aims:
Multiple imputation is often recommended over complete case analysis for handling missing data in clinical trials due to its ability to recover information from participants with incomplete data. While multiple imputation is generally held to be more efficient than complete case analysis, its planned use in clinical trials is typically not considered during sample size estimation. The standard approach of inflating the sample size for anticipated loss to follow-up is applicable for complete case analysis but could lead to excess power and hence inefficient resource use should multiple imputation be planned for analysis. In this article, we systematically reviewed published clinical trials with the aim of quantifying the precision advantages of multiple imputation over complete case analysis in treatment effect estimation, hence informing sample size planning for future trials.
Methods:
We conducted a targeted review of clinical trials published between January 2019 and December 2023 in Lancet, The BMJ, Journal of the American Medical Association and New England Journal of Medicine. Clinical trials were eligible for inclusion if point and variance estimates for the effect of treatment on a primary efficacy or safety outcome could be determined for both multiple imputation and complete case analysis. The design effect due to multiple imputation was calculated as the variance of the treatment effect estimate using multiple imputation divided by the corresponding variance using complete case analysis. As a supplementary analysis, we also conducted an untargeted review of other journals by searching in PubMed for clinical trials with the keywords ‘imputation’ or ‘imputed’ in their title or abstract.
Results:
The targeted search identified 547 articles, of which 59 satisfied eligibility criteria. Included trials tended to be large in size (median 653 participants) and reported a median of 8.6% missing data in the complete case analysis of the primary outcome (range 0.4%–30.5%). Multiple imputation was most frequently applied using chained equations under a missing at random assumption, with auxiliary variables included in the imputation model in most trials. The median design effect due to multiple imputation was 1.00 in both unadjusted (n = 15 trials) and covariate-adjusted analyses (n = 46 trials), suggesting multiple imputation typically was not offering precision advantages over complete case analysis. Similar design effects were observed in the untargeted review (median 0.96 and 1.01 for unadjusted and covariate-adjusted analyses), despite higher rates of missing data overall (median 15.7%, n = 49 trials).
Discussion:
Multiple imputation did not consistently lead to more precise treatment effect estimates than complete case analysis in the trials included in the review. Findings should not be construed as an argument against the use of multiple imputation but suggest the standard approach of inflating the sample size for anticipated loss to follow-up is reasonable when multiple imputation is planned for analysis.
Introduction
Missing data are a common threat in clinical trials and can severely compromise study findings if handled inappropriately. When data are missing, the validity of any analysis depends on the process responsible for the data being missing, termed the missing data mechanism. Traditionally, data have been classified as missing completely at random if the probability of missing data is unrelated to observed or unobserved data, missing at random if the probability of missing data is unrelated to unobserved data, conditional on observed data, or missing not at random if the probability of missing data depends on unobserved data. 1 As these classifications can be difficult to assess with multiple variables subject to missing data, m-DAGs (directed acyclic graphs with nodes to indicate missingness for each incomplete variable) may instead be favoured for depicting assumptions about the missing data mechanism and guiding the choice of analysis. 2 As the missing data mechanism cannot typically be verified from observed data, analysis should proceed under a plausible assumption about the missing data, consistent with the targeted treatment effect (estimand), 3 with the robustness of findings to this assumption assessed in sensitivity analyses.4,5
The default approach to handling missing data in most statistical packages is to restrict analysis to participants with complete data on all analysis model variables, termed a complete case analysis (CCA). While CCA is often viewed as having limited applicability due to its general reliance on a missing completely at random assumption,5–7 it can produce valid estimates in specific situations under alternative missing data mechanisms. For example, if missing data are confined to a univariate (i.e. not repeatedly measured) outcome of a clinical trial and the probability of missingness depends only on treatment group and covariates for adjustment, CCA provides unbiased and efficient treatment effect estimates.8,9 As more complex missing data mechanisms often occur in practice, alternatives to CCA are needed.
Among possible alternatives to CCA, multiple imputation (MI) has emerged as one of the most popular in clinical trials, owing to its accessibility and flexibility. In MI, a statistical model is fitted to observed data and used to predict the missing values, with multiple predictions generated to reflect uncertainty due to missing data. Following imputation, each completed dataset is analysed separately using the trial’s primary analysis model, and the results combined using Rubin’s rules. 1 In its standard implementation, MI provides valid estimates when the imputation model is appropriately specified and data are missing at random, although the approach can also be applied under missing not at random assumptions. An appealing feature of MI is the ability to incorporate information from auxiliary variables, which are variables not required in analysis but used for predicting missing values. The inclusion of auxiliary variables can improve precision and increase the plausibility of the missing at random assumption, hence reducing bias. 10 Supported by an extensive evidence base, MI is generally held to be more efficient than CCA, although to what extent depends on the amount of information that can be recovered from participants with incomplete data.11,12 While simulation studies have provided considerable insight into the comparative performance of MI and precision gains possible with auxiliary variables,10,12–15 little is known about its impact on estimation in real clinical trials.
Despite the ability of MI to recover information from incomplete cases, its planned use in clinical trials is typically not considered during sample size estimation. Recent work has shown the amount of statistical power that can be gained by MI, relative to CCA, depends on the ratio of the between-imputation to within-imputation variance of the estimate of interest, a quantity that is rarely reported in real applications and difficult to anticipate in advance of analysis. 16 In practice, the usual approach to addressing the expected loss in power due to missing data is to multiply the estimated sample size assuming complete data by the inverse of the expected follow-up rate.17,18 For example, if the required sample size with complete data is 500 participants and 80% follow-up is expected, the inverse of the expected follow-up rate is 1/0.8 = 1.25 and the required sample size accounting for missing data is 500 × 1.25 = 625. While this inflation provides correct power for CCA, it does not account for potential precision gains with MI and hence may result in an over-powered trial and inefficient use of resources should MI be planned for analysis.
In this article, we systematically reviewed published clinical trials with the aim of quantifying the precision advantages of MI over CCA in treatment effect estimation. To inform sample size planning in future trials, we explored whether precision gains were associated with specific trial and analysis characteristics, including the use of auxiliary variables in MI, the proportion of missing data and whether covariate adjustment was performed. Secondary aims were to describe differences in the point estimate of the treatment effect between CCA and MI, and for comparison with previous systematic reviews, to assess the quality of implementation and reporting of MI.
Methods
To evaluate the impact of MI on estimation in clinical trials, we systematically reviewed trials reporting point and variance estimates for the treatment effect obtained using both MI and CCA. We conducted a targeted review of clinical trials published in four leading general medical journals and an untargeted review of clinical trials published in other journals. We expected that trials in the targeted review would be of higher quality, representing current best practice in the implementation of MI, whereas trials in the untargeted review would provide further insight on relative performance in settings with smaller sample sizes and/or higher rates of missing data. The PRISMA guidelines were followed for transparent reporting of systematic reviews. 19
Search strategies
In the targeted review, we considered original reports of clinical trials published between January 2019 and December 2023 in Lancet, The BMJ, Journal of the American Medical Association and New England Journal of Medicine. These high-impact general medical journals were chosen based on their perceived quality and to allow comparisons with previous reviews of missing data handling in clinical trials.17,20–22 Articles were identified by searching on the term ‘multiple imputation’ within each journal’s website. For the untargeted review, a PubMed search was conducted on 23 March 2023 to identify clinical trials published between January 2019 and March 2023. Search terms were based on an adaption of the Cochrane sensitivity and precision maximising strategy for identifying clinical trials, 23 with additional terms to identify the keywords ‘imputation’ or ‘imputed’ in the title or abstract. The full search strategy for the untargeted review is provided in the Supplemental Material (Appendix 1).
Inclusion criteria
Clinical trials conducted in humans and published in English were eligible for inclusion if point and variance estimates for the effect of treatment on a primary efficacy or safety outcome were reported or could be determined (e.g. from confidence intervals or standard errors) for both MI and CCA. Any analysis method retaining information from participants with incomplete data (e.g. survival analysis with censored observations, linear mixed models for longitudinal data) was not considered to constitute a CCA. Articles were deemed ineligible if MI and CCA were applied using differently specified analysis models or for alternative estimands, for example, CCA for a per-protocol analysis excluding non-compliers and MI for an intention-to-treat analysis. Where covariate-adjusted analyses were reported for both MI and CCA but the adjustment set was not specified for one approach, we assumed adjustment sets were equivalent. Clinical trials involving more than two randomised groups were excluded to avoid the extraction of dependent treatment effects, as were trials involving cluster randomisation or participant crossover given the different statistical issues raised and more limited development of applicable MI methodology. Clinical trials included in the targeted review were excluded from the untargeted review to avoid double counting.
Study selection
Titles and abstracts of identified articles were exported to EndNote software and examined to assess eligibility. Articles were classified as ineligible, with reason, or potentially eligible. Full texts of potentially eligible articles were then examined to confirm eligibility, with information from eligible articles transcribed to a pre-piloted standardised data extraction form (Supplemental Material, Appendix 1). Details reported in supplementary materials were assessed during the full-text review. For each article, eligibility assessment and data extraction was performed by one of two reviewers (TRS and JMB), with any uncertainties resolved via discussion.
Data extraction
For each eligible article, data were extracted on the number of participants randomised, type of intervention and type of primary outcome (where the primary outcome was identified using standardised criteria, see Supplemental Material, Appendix 1). Concerning missing data handling, information was extracted on the approach to missing data in sample size calculations, primary method of analysis (MI, CCA, other), analysis model for complete cases, MI procedure details (method, number of imputations, inclusion of auxiliary variables) and proportion of participants excluded from the CCA of the primary outcome. Point and variance estimates (standard errors or confidence intervals) for the treatment effect on the primary outcome were extracted for both MI and CCA, with separate estimates obtained for unadjusted and covariate-adjusted analyses where available.
Statistical methods
Trial characteristics and approaches to handling missing data were summarised descriptively. To quantify the precision advantages of MI, design effects were calculated as the variance of the treatment effect estimate using MI divided by the corresponding variance using CCA; design effects less than one therefore indicate MI is more precise than CCA. Design effects were typically calculated from confidence intervals for the estimated treatment effect (i.e. [width MI confidence interval/width CCA confidence interval] 2 ), with a log transformation applied to the limits where the treatment effect was expressed as an odds ratio, relative risk or rate ratio. The standardised percent change in the treatment effect estimate with MI, following log transformation where applicable, was derived as 100 multiplied by the difference between MI and CCA treatment effect estimates divided by the standard error of the CCA treatment effect estimate. All statistical calculations were performed using Stata v18 (StataCorp LP).
Results
Targeted review of high-impact general medical journals
The targeted search identified 547 articles, 311 of which were excluded based on a review of titles and abstracts. Following full-text review, 59 of the remaining 236 articles satisfied eligibility criteria and were included in the review (Figure 1). The full list of included articles is provided in the Supplemental Material (Appendix 2).

Search results for the targeted review.
Key characteristics of the included clinical trials are shown in Table 1. Overall, the trials tended to be large in size (median 653 participants) and often involved the evaluation of a therapeutic drug or medicine. Most clinical trials included a binary (n = 32, 54%) or continuous (n = 23, 39%) primary outcome, with the odds ratio, relative risk and risk difference all common choices for expressing the effect of treatment for binary outcomes. Linear (n = 26, 44%) and logistic regression (n = 9, 15%) were frequently chosen for the analysis of the primary outcome, with a range of other analysis approaches adopted in remaining trials. All 59 clinical trials employed an intention to treat or modified intention to treat approach for the main analysis.
Trial characteristics in the targeted review.
Includes models involving random effects or estimation of robust standard errors.
Excluding two clinical trials with incomplete reporting, the percentage of missing data in the CCA of the primary outcome ranged from 0.4% to 30.5%, with a median of 8.6% (Table 2). Missing data typically occurred in the outcome, although in two clinical trials missing data were confined to covariates for adjustment. Most clinical trials (n = 40, 68%) inflated their sample size to account for anticipated missing data; in 26 (65%) the inflation was conservative and the observed percentage of missing data was lower than anticipated. CCA was more commonly chosen than MI as the primary method for handling missing data (53% vs 39%), with the remaining five trials opting for single imputation methods (e.g. non-responder imputation) in the primary analysis. MI was most frequently applied using chained equations under a missing at random assumption, with auxiliary variables included in the imputation model in most trials (n = 41, 69%; n = 20 incorporating post-randomisation auxiliary variables and n = 21 with baseline auxiliary variables only). A median of 25 imputations was used for MI, in general exceeding the percentage of missing data and so likely keeping Monte Carlo error acceptably low. 24 Key details on the implementation of MI were often poorly reported, with the method of MI and number of imputations not stated in 17 (29%) and 21 (36%) articles, respectively, and the use of auxiliary variables unclear in 14 articles (24%). Where available, statistical analysis plans included in supplementary materials often contained little detail on how MI would be implemented.
Handling of missing data in the targeted review.
Missing data were typically confined to the outcome, but could also occur in covariates for adjustment.
Missing at random assumed if no statement about implementation under a missing not at random mechanism; no trials presented an m-DAG.
Figure 2 shows the distribution of design effects associated with the use of MI. The median design effect due to MI was 1.00 in both unadjusted (interquartile range 0.93–1.00, n = 15) and covariate-adjusted analyses (interquartile range 0.97–1.08, n = 46), suggesting MI was not typically offering precision advantages over CCA. While design effects did not appear affected by the proportion of missing data, types of auxiliary variables used (median adjusted design effect 1.00 vs 1.03 in trials with post-randomisation vs baseline-only auxiliary variables), or other trial and analysis characteristics, their variability across trials increased with the proportion of missing data (Supplemental Material, Appendix 2). Concerning the four most pronounced design effects, in one trial 25 (design effect 0.60), the confidence interval for the treatment effect was reported with low precision, which may have impacted the accuracy of the derived design effect; in another two trials (design effects 0.68 and 1.70),26,27 the analysis model fitted following imputation was ambiguous, such that covariate adjustment sets may have differed between CCA and MI; and in the last trial (design effect 1.48), 28 the reported number of auxiliary variables was extreme given the number of events in the binary outcome, which may have led to overfitting issues and a larger treatment effect variance under MI.

Design effects by analysis approach in the targeted review.
The standardised difference in the treatment effect estimate between MI and CCA is shown in Figure 3. While MI failed to demonstrate clear precision advantages over CCA, it often led to noticeable changes in the treatment effect estimate (in either direction). In general, the degree to which MI shifted treatment effect estimates was similar for unadjusted and covariate-adjusted analyses and appeared to increase with the proportion of missing data. Standardised differences did not appear to be associated with design effects due to MI (Supplemental Material, Appendix 2).

Standardised change in the treatment effect estimate (MI vs CCA) by proportion of missing data, targeted review.
Untargeted review
Results of the untargeted systematic review are presented in the Supplemental Material (Appendix 3). Briefly, the untargeted search identified 797 articles, of which 49 satisfied eligibility criteria. Compared to articles in the targeted review, the included clinical trials were smaller in size (median 440 vs 653 participants), involved more missing data (median 15.7% vs 8.6% missing data) and more commonly adopted MI as the primary method of analysis (61% vs 39%). Key details concerning the implementation of MI were again poorly reported, with 17 clinical trials (35%) failing to provide adequate detail on the inclusion of auxiliary variables. The median design effect due to MI was 0.96 in unadjusted analyses (interquartile range 0.91–1.03, n = 21 trials) and 1.01 in covariate-adjusted analyses (interquartile range 0.90–1.07, n = 31 trials), again suggesting MI was not offering consistent precision advantages over CCA.
Discussion
By incorporating information from participants with incomplete data, MI is generally thought to facilitate more precise estimation than CCA. In this article, we aimed to verify whether this was the case for the estimation of treatment effects in published clinical trials. Across a targeted review of four high-impact general medical journals and an untargeted review of other journals, the median design effect due to MI was close to 1.00 for both unadjusted and covariate-adjusted analyses, with average design effects appearing unrelated to specific trial and analysis characteristics. This suggests MI may not offer important precision advantages over CCA in many clinical trials, and thus, the standard approach of inflating the sample size by the inverse of the expected complete data proportion is reasonable when MI is planned for analysis.
Several factors may have contributed to the lack of precision advantages with MI in this review. First, the proportion of missing data was low in several included trials, particularly those published in high-impact journals, limiting the amount of information that could be recovered by MI. We note the median of 8.6% missing data in high-impact journals was comparable to the degree of missingness found in earlier reviews of clinical trials involving the same journals (range 9%–12%).17,20–22 Second, results may reflect a lack of useful auxiliary variables in many of the clinical trials. When missing data occur in the outcome but not key exposure and adjustment variables, as is typically the case in clinical trials, the ability of MI to improve estimation depends solely on the information about the incomplete outcome provided by auxiliary variables. 9 In this case, auxiliary variables will be most beneficial for precision when strongly correlated with the incomplete outcome and subject to less missing data.10,14 Perhaps such prognostic auxiliary variables were not available, identified, collected or incorporated in the imputation model in many of the included trials. Finally, MI may not have been implemented optimally in some included trials. In four articles,29–32 for example, MI was applied without the inclusion of auxiliary variables, which in the case of missing data confined to the outcome entails the same assumption about the missing data mechanism as CCA and no ability to improve the precision of treatment effect estimates. In this situation, MI adds unnecessary Monte Carlo error to estimation and should not be used.8,9 Conversely, in some clinical trials 28 , the number of auxiliary variables appeared excessive given the number of complete cases, which in previous methodological work has been linked with a reduction in precision with MI. 13 While some precision losses with MI may be reasonable in the absence of strong auxiliary variables, reflecting appropriate uncertainty, a large decrease in precision could also be an indicator of an inappropriately specified imputation model requiring further investigation.
Disappointingly, practically important shortcomings in the documentation of key MI procedure details were observed in the review, suggesting reporting inadequacies identified in past reviews persist.20,21 In the review of high-impact journals, for example, the method of MI and number of imputations was not stated in 29% and 36% of articles, respectively, while the use of auxiliary variables was unclear in 24%. Where auxiliary variables were included in imputation models, it was often unclear how they were selected or what the resulting functional form of the imputation model was. In addition, few articles stated the missing data mechanism assumed or offered justification for its plausibility, as recommended in key guidance documents on handling missing data in clinical trials.5,6,18 Given that assessing the quality of implementation and reporting of MI was a secondary objective of the review, we did not systematically extract information on all aspects of MI implementation, yet it remains clear that further work is needed to improve the specification of MI methods in practice. It is encouraging to see more prominence given to missing data handling in the latest CONSORT statement, and we recommend readers consult Box 8 in the explanation and elaboration document for detailed guidance on reporting the use of MI. 18
A potential limitation of this review is the selection bias associated with the search strategy and inclusion criteria. The precision advantages offered by MI in high-impact general medical journals, or in articles with imputation mentioned in the title or abstract, may not generalise to the full spectrum of clinical trials where MI has been used. For inclusion in the review, articles had to report point and variance estimates for the treatment effect for both MI and CCA. With guidance documents questioning the validity of CCA and promoting primary analysis under a missing at random assumption where plausible,5,6 many trials may now forego CCA in favour of MI, with sensitivity analysis considering less-restrictive missing-not-at-random assumptions. Another limitation was that MI was often performed as a sensitivity analysis to CCA, which may have reduced the rigour of its implementation or extent of reporting, particularly in light of journal space restrictions. Finally, the accuracy of the design effects may have been diminished by limited precision in the reporting of confidence intervals or mismatching covariate adjustment sets between CCA and MI (in the few trials where this information was ambiguous). We do not expect these issues affected overall study conclusions, however.
In summary, MI did not typically lead to more precise treatment effect estimates than CCA in the clinical trials included in this review. This result should not be construed as an argument against the use of MI, as differences in treatment effect estimates were often evident between MI and CCA in included trials, and precision gains may be possible with strong auxiliary variables and higher proportions of missing data. However, the lack of a clear precision advantage of MI in this review suggests the standard approach of inflating the sample size by the inverse of the expected complete data proportion is generally reasonable when MI is planned for analysis. As noted in previous reviews, continued efforts are needed to improve the reporting of key MI procedure details in publications of clinical trials.
Supplemental Material
sj-docx-1-ctj-10.1177_17407745261422359 – Supplemental material for Multiple imputation in clinical trials – what difference does it make?: A systematic review of the impact of multiple imputation on treatment effect estimation
Supplemental material, sj-docx-1-ctj-10.1177_17407745261422359 for Multiple imputation in clinical trials – what difference does it make?: A systematic review of the impact of multiple imputation on treatment effect estimation by Thomas R Sullivan, Katherine J Lee, Jana M Bednarz and Lisa N Yelland in Clinical Trials
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: TRS is supported by a Hospital Research Foundation Group fellowship (ID 104-83100). KJL is supported by a National Health and Medical Research Council investigator grant (Level 1, ID 2017498). This research was supported by a Centre of Research Excellence grant from the NHMRC (ID 1171422) to the AusTriM Research Network.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
