Abstract
Background:
Missing data compromise the internal and external validity of trial findings, however there is limited evidence on how best to reduce missing data in palliative care trials.
Aim:
To assess the association between participant and site level factors and missing data in palliative care trials.
Design and setting:
Individual participant-level data analysis of 10 phase 3 palliative care trials using multi-level cross-classified models.
Results:
Participants with missing data at the previous time-point and poorer performance status were more likely to have missing data for the primary outcome and quality of life outcomes, at the primary follow-up point and end of follow-up. At the end of follow-up, the number of site randomisations and number of study site personnel were significantly associated with missing data. Trial duration and the number of research personnel explained most of the variance at the trial and site-level respectively, except for the primary outcome where the amount of data requested was most important at the trial-level. Variance at the trial level was more substantial than at the site level across models and considerable variance remained unexplained for all models except quality of life at the end of follow-up.
Conclusion:
Participants with a poorer performance status are at higher risk of missing data in palliative care trials and require additional support to provide complete data. Performance status is a potential auxiliary variable for missing data imputation models. Reducing trial variability should be prioritised and further factors need to be identified and explored to explain the residual variance.
Keywords
Missing data reduce the power, precision, generalisability and validity of study findings.
A systematic review of palliative care trials found nearly one quarter of primary outcome data are missing.
It is essential that missing data are reduced as much as possible but how to effectively achieve this is unknown.
Using individual participant-level data, this study found a poorer performance status and missing data at a previous time-point are strongly and consistently associated with missing data in palliative care trials.
Site-level factors were also found to have a significant association with missing data at the end of follow-up, although variability between trials was more substantial than between sites.
Trial duration and the number of research personnel explained most of the variance at the trial and site-level respectively, except for the primary outcome at the primary follow-up point where the amount of data requested was most important at the trial-level.
Participants with a poorer performance status and those with previous missing data are at higher risk of missing data in palliative care trials and should be identified early and provided with additional support to enable the provision of complete data; they should not be excluded from trials.
Performance status, in particular, could also be considered as an auxiliary variable for missing data imputation models in palliative care trials thus making a missing at random assumption more plausible and missing data analyses more robust.
Reducing variability between trials is important and further assessment of how site-level factors affect missing data is required.
Introduction
Missing data compromise the power, precision, generalisability and validity of study findings. A systematic review of palliative care trials found nearly one quarter of primary outcome data were missing with evidence of subsequent bias. 1
To minimise the impact of missing data on trial findings, prevention is important as statistical methods to handle existing missing data are based on unverifiable assumptions.2,3 However, there is little research on how best to reduce missing data. 2 A Cochrane review of trials testing strategies to improve retention in randomised controlled trials, an overlapping issue, included 38 studies that assessed a range of interventions. 4 Assessment of interventions to improve trial management and site-level factors was limited, and most were evaluated in single trials in a particular context. 4
Effective interventions to reduce missing data should be based on evidence. It is necessary to identify the factors associated with missingness to inform the design of such interventions. A meta-regression of factors associated with missing data in a systematic review of palliative care trials found that trial duration and the amount of data requested during the trial were associated with missing data. 1 However, there was insufficient evidence that participant-level factors such as age and performance status were associated with missing data. 1 This analysis, however, relied on aggregate-level participant data. Furthermore, there were no data regarding site-level factors.
Individual-participant level data on the other hand uses the raw unit-level data from each primary study.23,24 This allows different sources of heterogeneity in the effect estimate to be explored (e.g. participant, site and trial-level), multiple participant-level factors to be examined in combination, identification and handling of missing data at the individual-level and models to be developed and validated using statistical techniques that are standardised across studies. 5
The aim of this study was to use individual participant-level data to assess participant and site-level factors associated with missing data in palliative care trials. The objectives were to:
Assess factors associated with missing data for the primary outcome at the primary follow-up point (Timepoint 1).
Assess factors associated with missing data for any primary quality of life (QoL) outcome at Timepoint 1, given its importance to palliative care clinical practice. 6
Assess factors associated with missing data for the primary outcome and QoL outcome at the end of follow-up (Timepoint 2).
Methods
Protocol
The protocol for this review could not be registered with PROSPERO, 7 as it was a methodological review and did not meet the requirements for registration. The protocol was internally peer-reviewed by methodological experts including those with expertise in individual participant level data analysis.
Eligibility criteria
Randomised controlled trials were eligible if: participants were over 18 years old with an advanced life-limiting illness and palliative care needs; the interventions were palliative where the primary aim is to improve QoL, rather than survival, although this may be a secondary gain; the comparator was a placebo, standard care or another palliative intervention; the primary outcome was patient-reported; the trial was a priori adequately powered and the trial was completed in the 5 years before this study began. Published and unpublished trials were included with no language restrictions.
Identifying studies, selection and data collection process
Trials were identified through professional contacts. The level of missing data was unknown to the principal investigator (JH) before contact and did not influence whether the trial protocol was assessed. All 10 anonymous datasets were securely accessed.
Data items
The participant, site and trial-level data items extracted are presented in Table 1. The primary outcome and QoL data were extracted at Timepoint 0, Timepoint 1 and Timepoint 2 – defined as baseline (Timepoint 0), primary follow-up point (Timepoint 1) and end of follow-up (Timepoint 2). If the QoL measure was not the primary outcome, the data for the main QoL measure of interest were extracted. The Australia-Modified Karnofsky Performance Scale (AKPS) was extracted at Timepoint 0 and Timepoint 1.
Explanatory variables assessed.
Specification of outcomes and explanatory variables
The pre-specified outcome variables of interest were whether the primary outcome and QoL scores were observed or missing at Timepoint 1 and Timepoint 2:
Timepoint1-PO-missing = whether the primary outcome was missing at Timepoint 1
Timepoint2-PO-missing = whether the primary outcome was missing at Timepoint 2
Timepoint1-QoL-missing = whether the QoL outcome was missing at Timepoint 1
Timepoint2-QoL-missing = whether the QoL outcome was missing at Timepoint 2.
For the primary outcome, a single symptom-control measure was used for all the trials (this was not a pre-specified criteria) therefore whether this value was entered in the dataset or not was coded as 1 = missing and 0 = observed. The QoL measures consisted of a number of question items (range 20–28 items), which were found, in general, to be either all answered or all missing. Therefore, these outcomes were dichotomised into all present or absent if one or more questions were missing. Specifications of the explanatory variables are available in Table 1. A systematic review of palliative care trials found that trial duration and the individual number of questions and tests requested from the participant (as a measure of trial burden) were associated with missing data. 1 The models adjusted for these two variables as the individual participant-level data analysis aimed to determine whether participant and site-level factors were associated with missing data once these trial-level factors were taken into account.
Synthesis methods
Multilevel, cross-classified models were developed as participants (level 1 units) were nested within combinations of trials and sites (level 2 units). A mixed-effects model was used, with fixed-effects for all independent variables and random intercepts for the trials and sites. 8
Analysis strategy
The analysis strategy was based on a systematic approach that commenced with level 1 fixed-effects, then the addition of higher-level explanatory variables, followed by tests for interactions.9,10 Further details of the analysis strategy are available in Supplemental Material 1.
Categorising variables
To determine whether continuous explanatory variables should be treated as categorical variables, the relationship between the explanatory variable and outcome was assessed using a scatter plot. If this indicated that categorisation of the variable might fit the data better, the model treating the variable as a continuous variable was compared to that treating it as a categorical variable using a likelihood ratio test.
Variance
The proportion of the total variance due to the different group-levels was assessed using the variance partition coefficient. The variance partition coefficient is interpreted as the proportion of the total residual variance in the propensity to be missing that is due to differences between either trials or sites, or both. 11 In this analysis the latent variable representation approach was used. 12
In each model there was evidence of many combinations of trials and sites with several observations, therefore a random interaction between trial and site was also tested using a likelihood ratio test 13 and, if appropriate, included as a further level. This allowed the assumption that the trial and site effects were additive to be relaxed. 13
Interactions
Interactions of both age and diagnosis with AKPS were tested.
Handling missing data in explanatory variables
There were no missing data for the model outcomes as missingness was the outcome of interest. Missing values for the explanatory variables were explored to determine the justifiability of a missing completely at random assumption. If this was not found, imputation using chained equation models congenial with the model of interest were conducted under missing at random and plausible missing not at random assumptions and data were imputed using within-trial imputation. The effect estimates and random-effects were compared as part of principled missing data sensitivity analyses (see Supplemental Material 2).
All analyses were conducted using Stata v.13 and a p-value ⩽0.05 was considered to be statistically significant unless otherwise stated. Data extraction and analyses were completed in December 2017.
Ethics and consent
Secondary analysis of anonymised data of the included trial datasets was allowed under the original human research ethics approval for each trial.
Results
Thirteen studies were screened. Ten were eligible for at least one model, all of which were conducted in Australia and the UK (see Table 2). One trial was excluded (feasibility study) and data were not provided for another two studies. The number of trials included for each model varied because of varying measurement outcomes and time-points (see Table 3). However, the descriptive statistics of the variables included in the models were comparable (Table 3).
Characteristics of included trials.
Baseline Australia-modified Karnofsky Performance Scale.
Summary of the variables included in each model.
PO: primary outcome; QoL: quality of life; T0: baseline; T1: timepoint 1; T2: timepoint 2; SD: standard deviation.
Outcome variable for the model.
Factors associated with missing data at Timepoint 1
The multivariable models for missing data for the primary outcome at Timepoint 1 showed a strong association between baseline and Timepoint 1 primary outcome missing data (OR 17.2, 95% CI: 8.6, 34.5). This indicates that those with missing data at baseline were highly likely to have missing data at Timepoint 1 (Table 4). As the baseline AKPS increased (i.e. improved), the odds of missing data for the primary outcome at Timepoint 1 reduced significantly (OR 0.78 (95% CI: 0.70, 0.87) per 10-unit increase) (Table 4).
Multivariable multi-level model: participant, trial and site level factors associated with missing data for the primary outcome and main QoL outcome at Timepoint1.
PO: primary outcome; QoL: quality of life; Timepoint0: baseline; Timepoint1: primary end-point.
p < 0.05. ***p < 0.001.
Findings for Timepoint1-QoL-missing were similar. As QoL was a secondary outcome in all trials (not a pre-specified criteria), the association with missing data for the primary outcome at baseline was also assessed, which found participants with missing data for the primary outcome at baseline were more likely to have missing data for the QoL outcome at Timepoint 1 (Table 4).
Factors associated with missing data at Timepoint 2
When the variable site-personnel was treated as a categorical variable (vs continuous), it fitted the data better (Timepoint2-PO-missing p = 0.0009; Timepoint2-QoL-missing p = 0.0002) and was therefore treated as categorical (Table 5, Supplemental Material 3).
Multivariable multi-level model: participant, trial and site level factors associated with missing data for the primary outcome and main QoL outcome at Timepoint 2.
PO: primary outcome; QoL: quality of life; Timepoint1: primary end-point; Timepoint2: end of follow-up.
p < 0.05. ***p < 0.001.
A strong association was found between missing data for the primary outcome at Timepoint 2 and Timepoint 1 (OR 8.0, 95% CI: 5.4, 11.8) and Timepoint1-AKPS (per 10 unit change OR 0.7, 95% CI: 0.6, 0.8); which indicates that those with previous missing data for the primary outcome and poorer performance status were more likely to have missing data at Timepoint 2. Sites that randomised more participants were more likely to have missing data (per 10 randomisations OR 1.1, 95% CI: 1.0, 1.2). The number of site personnel was also significantly associated with missing data, with sites who had two research personnel being more likely to have missing data (OR 2.6, 95% CI: 1.1, 6.0), and those with four personnel less likely (OR 0.07, 95% CI: 0.01, 08) to have missing data, than sites with one research nurse.
Findings for missing QoL data at Timepoint 2 were similar, with an additional strong association with missing data for the primary outcome at Timepoint 1 (OR 11.8, 95% CI: 6.9, 20.3) and trial duration (per 7 days, OR 0.6, 95% CI: 0.5, 0.8).
For all models, there was insufficient evidence of significant interactions between participant-level factors (data not shown).
Variance explained
A non-additive model with a trial-by-site interaction level was the preferred model for missing data for the primary outcome at Timepoint 1 (p = 0.005) and Timepoint 2 (p = 0.01) (Table 6). By adding the interaction level to the Timepoint1-PO-missing null model, the variance at the site-level became negligible (<0.0001 on the log-odds scale). This suggests that for the primary outcome at Timepoint 1 the effect of sites differs within trials, rather than sites having an independent effect on missingness regardless of the trial for which they were recruiting.
Multivariable multi-level model: residual variance, variance partition coefficient (VPC) and proportion of variance explained at the different levels.
Variance partition coefficient (VPC): Proportion of the total variance due to the different group levels that is trial and site.
Proportion of the variance explained by the multivariable model compared to the null model (i.e. without covariates).
The multivariable model explained almost all of the variance at the trial and site-level for the Timepoint2-QoL-missing model but not for the other outcomes (Table 6). For the Timepoint1-PO-missing model, data requested explained the trial-level variability the most (Supplemental Material 4). Trial duration explained the trial-level, and the number of research personnel the site-level, variance the most for Timepoint2-PO-missing, Timepoint1-QoL-missing and Timepoint2-QoL-missing models (Supplemental Material 4).
Explanatory variable missing data
Exploration of the missing data suggested complete case analysis under a MCAR assumption was reasonable for the Timepoint1-PO-missing model. However, the MAR assumption was more plausible for Timepoint1-QoL-missing, Timepoint2-PO-missing and Timepoint2-QoL-missing, therefore the final model for these outcomes used multiple imputation under a MAR assumption. Missing data sensitivity analyses under various MNAR assumptions did not change the findings considerably (Supplemental Material 2).
Discussion
Individual participant-level data analysis of 10 palliative care trials found participants with a poorer performance status and those with previous missing data were more likely to have missing data at both Timepoint 1 and Timepoint 2. At Timepoint 2, site level factors were also found to be significantly associated with missing data. Trial duration and the number of research personnel explained most of the variance at the trial and site-level respectively, except for the primary outcome at Timepoint 1 where the amount of data requested was most important at the trial-level. Variance at the trial level was more substantial than at the site level and a considerable proportion of the variance remained unexplained at the trial and site-level for most models.
Participant-level factors
Missingness of the primary outcome and principal QoL measure at the previous time-point was strongly and consistently associated with missingness at the time-point of interest. This is most likely driven by complete withdrawals at the previous time-point. However, 17.3% of participants with missing data at a previous time-point provided data at the following time-point of interest. Thus, for participants continuing in a trial, missing data at one time-point should be a ‘red flag’ for future missing data.
Participants who were more functionally limited were more likely to have missing data. Trialists should not use this to justify the exclusion of participants with poor performance status from palliative care trials in order to reduce missing data. Such patients are a core group who require palliative care input. If eligible for the intervention in clinical practice, it is essential that they are actively recruited to trials and supported to provide data to maximise the generalisability of trial findings. This study highlights the need for additional consideration on how best to support this group to provide data as a trial proceeds, this may include a more flexible study design, different modalities of data collection and the use of proxies. 14 Any interventions to reduce missing data need to be evaluated to determine the most effective and cost-effective measures through studies across trials.14,15
This participant-level data analysis is the first to systematically assess the impact of the AKPS on missing data and to demonstrate a consistent and robust association. AKPS is therefore potentially a useful auxiliary variable16,17 for use in missing data imputation models in palliative care trials to make a missing at random assumption more plausible. Previous studies using aggregate level data 1 and participant-level data analysis 18 have not demonstrated an association between performance status and missing data. However this is likely related to ecological bias 1 and use of less sensitive measures 18 in these studies.
Site-level factors
Site-level variables and missing data were significantly associated at Timepoint 2 but not Timepoint 1. This may be because intensive central monitoring and checks are often in place for outcomes at Timepoint 1, but not always so at Timepoint 2 due to limited resources. The impact of site-level practices therefore may be more evident and influential after Timepoint 1 as the burden of the study on participants and site-staff increase. At Timepoint 2, sites that recruited more participants were more likely to have missing data. These findings are new, and the underlying reasons for these need exploration.
The number of research personnel employed at a site and missingness at Timepoint 2 were significantly associated. This was not a dose-response or simple linear relationship, suggesting that there may be other influential site-level factors, which have not been included in the models; for example, whether the researchers work full/part-time, staff turnover, research experience and level and content of research training. Furthermore, there is recognition that conducting palliative care research can be emotionally challenging with a need for additional resources to promote job satisfaction and sustainability for trial staff 19 which may also play an important role in data quality. Further research into how the number of researchers and research culture at a site may influence missing data is required.
Residual variance
There was significant variance between trials and sites, indicating that the effect of trial and site factors were important and need to be addressed. This is an important finding as often the dominant focus of missing data literature in palliative care research is on participant-level factors such as participants’ poor health and fatigue. 20 Also, for missing data for the primary outcome at Timepoint 1, unlike the other outcomes, there was little evidence that some sites were worse than others in a consistent way across trials (site-level variance), suggesting efforts to reduce missing data for the primary outcome at Timepoint 1 should focus on reducing between-trial variability.
The findings suggest that duration and the numbers of research personnel are key factors for consideration when trying to reduce missing data in palliative care trials, however other factors, such as data-requested, may be more important for the primary outcome at Timepoint 1. The variables included in the models explained almost all the variance at the trial and site-level for Timepoint2-QoL-missing, suggesting these variables have the greatest impact for QoL outcome missing data at Timepoint 2. However, as the Timepoint 2 models estimated a greater number of parameters at the site-level, this may be due to over-saturation at the site-level. Other participant, trial and site level factors, that were not included in the models, may be more crucial for the other outcomes and need to be investigated.
Limitations and strengths
The included trials were a small convenience sample, manageable within the time-frame of the study, which may limit the generalisability of the findings. However, the average extent of missing data for the primary outcome at Timepoint 1 (26%) and participant characteristics are consistent with a systematic review of 108 palliative care trials. 1 The principal investigator (JH) was blind to the extent and risk of bias of missing data and study characteristics at the time of selection, and the included trials involved participants with a range of ages, diagnoses and performance status scores, and varied in duration thus optimising generalisability (Table 2). Although sought, data on ethnicity and socio-economic status at the participant-level were not available consistently across the trials limiting our understanding of the representativeness of the study sample and the effect of these constructs on missing data. Data for two trials were not made available, and if the reason for this is related to missing data, it could bias the findings. 21 The majority of trials included in the sample were pharmacological trials and although two non-pharmacological complex intervention trials were included, additional considerations may be required for trials involving several interacting components. 22 The variables used in the models were restricted to those that were collected consistently across trials and could be quantified reliably.
The strengths of this study include the rigorous methodological approach which included multi-level modelling. Both published and unpublished palliative care trials were included and the participant-level data allowed the integrity of the data to be assessed.
Conclusion
Participants with a poorer performance status and previous missing data are at higher risk for missing data in palliative care trials and require early identification and support to provide complete data. These factors could also be considered as auxiliary variables in missing data imputation models, especially if associated with the missingness outcome. However, performance status only explained part of the residual variance, indicating that other factors affect missing data at the trial level; identifying these factors may help to reduce missing data in future trials, especially if they are modifiable factors.
Supplemental Material
sj-pdf-1-pmj-10.1177_02692163211040970 – Supplemental material for Performance status and trial site-level factors are associated with missing data in palliative care trials: An individual participant-level data analysis of 10 phase 3 trials
Supplemental material, sj-pdf-1-pmj-10.1177_02692163211040970 for Performance status and trial site-level factors are associated with missing data in palliative care trials: An individual participant-level data analysis of 10 phase 3 trials by Jamilla A Hussain, Ian R White, Miriam J Johnson, Martin Bland and David C Currow in Palliative Medicine
Footnotes
Acknowledgements
The authors would like to acknowledge the Chief Investigators of the trials (Prof. Meera Agar, Dr. Sara Booth, Prof. Katherine Clark, Dr. Paul Glare, Prof. Janet Hardy, Dr. Christine Sanderson) and the Australian national Palliative Care Clinical Studies Collaborative, Southern Adelaide Palliative Services who willingly supplied the data and answered queries when necessary. In addition, we acknowledge Zac Vandersman who extracted and cleaned data from several trials and Belinda Fazekas, Naomi Byfieldt and Dr. Morag Farquhar who answered data queries, and Prof. Lesley Stewart for advice on the protocol.
Author contribution
JH, DC, MJ, MB conceived the idea of the study. JH was the principal investigator and developed the protocol, conducted the data extraction and analysis and wrote the first draft of the paper. IW developed the data analysis protocol, supported the analysis of the data and the interpretation of the analyses and critically revised the paper. DC, MJ and MB helped to develop the protocol, interpret the findings and critically revised the paper. All authors approved the final version of the paper.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: David Currow and Miriam Johnson were Chief Investigators of four trials included in the analysis, however, were not involved in the data extraction or analysis.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded as part of a National Institute of Health Research Doctoral Research Fellowship (JAH; reference number DRF-2013-06-001). The National Institute of Health Research was not involved in the study design, data collection, analysis, interpretation of data, writing of the report and in the decision to submit the article for publication. IRW was supported by the Medical Research Council Unit Programme MC_UU_12023/21.
Data management and sharing
Statistical data files, additional charts and graphs are available from the corresponding author on request. The corresponding author does not have the right to share the original trial data.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
