Missing outcome data management in acute stroke trials testing iv thrombolytics. Is there risk of bias?

Abstract

Introduction

Missing outcome data may undermine interpretation of randomised clinical trials by weakening power and limiting apparent effect size. We assessed bias and inefficiency of two imputation methods commonly used in stroke trials evaluating the efficacy of iv thrombolysis.

Patients and methods

We searched the virtual international stroke trials archive (VISTA)-acute for ischaemic stroke patients with 90-day modified Rankin scale as an outcome, and known thrombolysis status. We excluded any with missing 30-day modified Rankin scale. We planned two analyses; first, we calculated odds ratios for outcome in thrombolysed versus not thrombolysed from imputed-only data, (a) among patients with missing modified Rankin scale 90 and (b) among matched patients with intact data (using propensity score methods and relevant covariates). Imputation approaches were last observation carried forward (LOCF) or multiple imputation. Outcome comparisons used dichotomisation and shift analysis. Thereafter, we calculated whole-population odds ratios using LOCF and multiple imputation (also through dichotomisation and shift analysis); first with the original 1.5% missing outcome data, and then artificially increasing the burden (5%; 10%; 20%; 30%).

Results

We considered 9657 patients from eight of the studies included in VISTA, 3034 (31%) thrombolysed. Missing data replacement by LOCF with analysis by dichotomisation gave the highest estimate of thrombolysis influence. Imputing while increasing the burden of missing data progressively raised the odds ratios estimates, though thresholds for overestimation were 10% for LOCF; 20% for multiple imputation.

Discussion: Replacing missing outcome data tended to overestimate differences of thrombolysed versus non-thrombolysed patients, but had minimal impact below a 10% burden of missing data.

Conclusion: In the specific context of acute stroke trials testing iv thrombolytics, replacing missing data by carrying forward the last observation tended to overestimate treatment odds ratios more than multiple imputation.

Keywords

Stroke trials missing data epidemiology last observation carried forward multiple imputation complete case analysis

Introduction

Missing data may be defined as values that are not available but that would be meaningful for analysis if they had been observed.¹ Missing outcome data ought to be distinguished and handled in a different way from missing data in covariates and auxiliary data. Missing outcome data may seriously compromise inferences from clinical trials by reducing statistical power, thereby increasing the chance of type II error.^2–4 Faced with missing outcome data, experts have claimed that the highest risk of bias may arise from simply excluding the missing values and analysing only the complete cases (complete-case (CC) analysis).⁵ Several ‘imputation’ methods have been proposed to compensate for missing data, to limit the impact.^1,3

In the stroke field, the most frequent imputation approach has been last observation carried forward (LOCF),^6–16 in which the most recent preceding value is substituted for the missing one. More advanced approaches include estimating-equation methods; among these, multiple imputation (MI) has been favoured after advances in necessary computing power.^16–20 Applying MI, multiple sets of plausible values are created from their model-based predictive distributions, and estimates and standard errors are obtained through multiple-imputation combining rules.

We could not identify any previous work within the stroke field that has addressed the effect of imputation strategies, or any assessment of the potential for missingness to undermine interpretation of the results of stroke trials. Our aim was to assess bias and inefficiency of the two most common imputation methods used in the stroke trials testing intravenous thrombolytics; and how these may impact on the estimated odds ratio for treatment efficacy.

Methods

For this study, we used data from acute trials held within the virtual international stroke trials archive (VISTA). We selected patients with ischemic stroke who had received placebo or any drug now known to possess no confirmed influence on stroke outcome; and for whom it was known whether they had received thrombolysis as part of their standard care. We took patients only from studies in which modified Rankin scale (mRS) recorded at day 90 had been a primary or secondary outcome, that had clearly distinguished any missing outcome values from imputed data, and that had recorded vital status to day 90. We investigated certain demographics/biomarkers that we identified from the literature to be related to ischemic stroke. These included age, sex, systolic and diastolic blood pressure at baseline, National Institutes of Health Stroke Scale (NIHSS) at presentation, pre-stroke mRS score, history of hypertension, history of diabetes, history of hyperlipidaemia, history of smoking; history of ischemic heart disease or myocardial infarction, history of heart failure, history of atrial fibrillation, history of stroke or TIA and body mass index. Our outcomes included day 30 and day 90 mRS; but only patients with at least an available mRS30 recording were included in order to perform LOCF.

We generated p values for comparisons of the basal characteristics between groups before and after propensity score matching, using the Chi-Square test for categorical variables and sample t test or Mann Whitney U test for parametric and non-parametric continuous variables. All odds ratios (OR) described here refer to the apparent influence of rtPA in treated vs untreated patients. We planned two sets of analysis.

Part 1

For this part, we divided the entire population into two groups: group M (missing mRS90) included patients with an available mRS30 but lacking mRS90; group C (complete data) included only those patients for whom both mRS30 and mRS90 were available. First, we matched patients from group C against those in group M using propensity score methods that considered age, gender, rtPA treatment, mRS30, NIHSS, diabetes and atrial fibrillation as the most relevant covariates. The maximum possible number of matches was included (1:n). As a result, we obtained two new groups M* (missing mRS90 matched) and C* (complete data matched), which we used for the rest of the analysis in this part (Table 1). We calculated treatment OR from imputed-only mRS90 data, separately considering both LOCF and MI, using for outcome comparisons first ‘dichotomisation’ (mRS 0–2 vs 3–6) and then ‘shift analysis’. We did this in group M* and also in group C*, by forcing a simulated imputed dataset as if all mRS90 values would be missing. This allowed us to compare the real versus the imputed OR in group C*. We regarded the OR of the real mRS90 data in group C* as the gold standard.

Table 1.

Distribution and comparison of relevant covariates among patients with and without missing data (before and after propensity score matching).

	Before matching (N = 9657)			After matching (N = 1313)
	Group M	Group C	p	Group M*	Group C*	p
Total	141	9516		134	1179
Age, mean (SE)	67.6 (14.4)	69.4 (12.4)	0.09	67.5 (14.3)	67.9 (13.1)	0.65
Sex male, n (%)	69 (48.9)	5196 (54.6)	0.18	66 (49.3)	623 (52.8)	0.43
rtPA, n (%)	41 (29)	2993 (31.5)	0.55	40 (29.9)	358 (30.4)	0.90
NIHSS, median (IQR)	13 (9–18)	12 (8–18)	0.21	13 (9–18)	13 (9–18)	0.85
Diabetes, n (%)	42 (30.9)	2002 (21.8)	0.01	42 (31.3)	347 (29.4)	0.64
AF, n (%)	35 (26.1)	2366 (26.6)	0.91	35 (26.1)	313 (26.6)	0.92
mRS30, median (IQR)	4 (2–5)	4 (2–4)	0.24	3.5 (2–4)	4 (2–4)	0.47
mRS90, median (IQR)	–	3 (1–5)	–	–	3 (1–5)	–

AF: atrial fibrillation; Groups C and C*: patients with both mRS30 and mRS90 available; Groups M and M*: patients with available mRS30 and missing mRS90; IQR: interquartile range; mRS30: modified Rankin scale at day 30; mRS90: modified Rankin Scale at day 90; NIHSS: national institutes of health stroke scale; rtPA: recombinant tissue-type plasminogen activator; SE: standard error.

Part 2

For this part, the whole population was considered. Three strategies were planned for managing missing mRS90 values: imputation through LOCF; imputation through MI; and CC analysis. Whole-population treatment OR were obtained for each strategy. We tested both ‘dichotomisation’ (mRS 0–2 vs 3–6) and ‘shift analysis’ for outcome comparisons. We repeated these analyses in circumstances where we artificially increased the burden of missing data in the whole population from the original 1.5% of missing data to 5%, 10%, 20% and 30% burdens of missing data, respectively. Cancellation of original data to generate missingness was performed randomly, but within patients in group C* (the subset of subjects with complete data, matched through relevant covariates with those with original missing mRS90 data). By doing this, we were assuming that missingness in stroke trials occurs within the subset of patients with certain baseline and clinical characteristics (based in personal unpublished data). By doing this, we aimed to capture the trends and behaviour of the previous modelling methods compared to the named ‘worst possible’ CC analysis. As in previous part 1 of the analysis, we regarded the OR of the real mRS90 data in group C* as the gold standard.

Results

Data from 9991 patients were obtained from eight studies within the VISTA database. After excluding patients with missing mRS30 values (N = 334), we worked with a final population of 9657 patients. This included 141 patients (1.5%) who had missing values for mRS at day 90. Table 1 shows subgroups based on the availability of mRS day 90; the distribution of relevant covariates; and the comparison between groups before and after propensity score matching. We obtained a group M* with a highly preserved number with respect to the original (134/141—the 7 excluded where due to missing values in covariates) and a group C* much smaller than the original (1179/9516) as expected, formed only by patients with similar characteristics to those in group M matched through the prespecified covariates. The ratio group C*/group M* approximated 9:1.

From the analysis of imputed-only data (Table 2), we observed that the combination of LOCF with a dichotomisation approach gives the highest OR. This was true for the group with original missing (M*) data (OR 1.33 (1.02–1.65) p = 0.2). The OR was exaggerated and became statistically significant with complete (C*) data (OR 1.72 (1.45–1.99) p = 0.001). Focusing on ORs of real vs imputed data in group C*, the LOCF-OR value was inflated by 37% when using a dichotomisation approach, and by 18% when using a shift analysis approach. The MI-OR were largely unaltered compared with the real values, regardless of the type of approach.

Table 2.

Adjusted OR and p values of imputed-only mRS90 data using LOCF or MI in the group with missing (M*) and without missing data (C*).

	LOCF			MI			Real data (group C*)			Increment (%OR)
	OR	SE	p	OR	SE	p	OR	SE	p	LOCF	MI
Analysis within Group M*
Dichotomisation	1.33	0.32	0.23	1.31	0.31	0.26
Shift analysis	1.15	0.14	0.26	1.11	0.14	0.39
Analysis within Group C*
Dichotomisation	1.72	0.27	0.001	1.22	0.37	0.53	1.25	0.29	0.33	+37.1	–3.2
Shift analysis	1.37	0.16	0.006	1.24	0.20	0.18	1.16	0.15	0.24	+17.9	+6.6

The analysis was repeated following the two most common outcome comparisons in clinical trials (dichotomisation and shift analysis). OR of real mRS90 data in group C* is regarded as the gold standard.

Group M*: patients with available mRS30 and missing mRS90; Group C*: patients with both mRS30 and mRS90 available; LOCF: last observation carried forward; MI: multiple imputation; OR: odds ratio; SE: standard error.

The whole-population OR obtained in part 2 using CC analysis (Table 3) did not differ much with those obtained through either of the modelling analyses (Figures 1 and 2). There was a maximum difference of 17.5% compared with MI in the category that had 30% missing data (dichotomisation approach) and a mean 5% global difference across the remainder of the categories.

Table 3.

Whole population adjusted OR, confidence intervals and p values comparing three different missing data management strategies by the two most common outcome comparisons in clinical trials (dichotomisation and shift analysis).

	CC			LOCF			MI
	OR (SE)	p	N	OR (SE)	p	N	OR (SE)	p	N
1.5% Missing (original)
Dichotomisation	1.30 (0.11)	0.002	9516	1.30 (0.11)	0.002	9657	1.29 (0.11)	0.002	9657
Shift analysis	1.19 (0.06)	<0.001	9516	1.19 (0.05)	<0.001	9342	1.16 (0.05)	0.001	9657
5% Missing
Dichotomisation	1.29 (0.11)	0.002	9166	1.28 (0.11)	0.004	9657	1.29 (0.11)	0.002	9657
Shift analysis	1.19 (0.06)	<0.001	8856	1.19 (0.05)	<0.001	9657	1.17 (0.05)	0.001	9657
10% Missing
Dichotomisation	1.33 (0.12)	0.001	8686	1.29 (0.11)	0.002	9657	1.29 (0.11)	0.002	9657
Shift analysis	1.19 (0.06)	<0.001	8376	1.19 (0.06)	<0.001	9342	1.16 (0.05)	0.001	9657
20% Missing
Dichotomisation	1.33 (0.12)	0.003	7721	1.33 (0.12)	0.001	9657	1.28 (0.12)	0.008	9657
Shift analysis	1.20 (0.06)	<0.001	7411	1.21 (0.06)	<0.001	9342	1.18 (0.06)	0.001	9657
30% Missing
Dichotomisation	1.40 (0.14)	0.001	6756	1.36 (0.13)	0.001	9657	1.33 (0.14)	0.009	9657
Shift analysis	1.22 (0.07)	<0.001	6446	1.22 (0.06)	<0.001	9342	1.23 (0.06)	<0.001	9657

CC: complete-case; LOCF: last observation carried forward; MI: multiple imputation; OR: odds ratio; SE: standard error; N: total number of patients analysed.

The analysis is repeated increasing the burden of missing data in the whole population.

Figure 1.

OR variation, following the three different analysis strategies, giving a higher burden of missing data. Outcome comparison through dichotomisation.

Figure 2.

OR variation, following the three different analysis strategies, giving a higher burden of missing data. Outcome comparison through shift analysis.

Discussion

In this work, we have shown that replacing missing outcome data may tend to overestimate outcome differences between thrombolysed versus non-thrombolysed patients, especially if the LOCF rather than MI approach is used. We also found that dichotomisation rather than shift analysis may cause overestimation. However, these distinctions all had minimal impact below a 10% burden of missing data.

By comparing the OR for the apparent influence of rtPA in treated versus non-treated patients from exclusively imputed data (part 1); we tried to purposely maximise the impact of each type of imputation method, in an attempt to observe and compare the tendency of their bias effect. The highest OR were obtained by combining LOCF with the dichotomisation approach. The exaggerated results in group C* compared with those of group M* may be in part due to a higher N though also to incomplete information regarding mechanisms of missingness.²¹ On the contrary, MI-OR showed a moderate variation between groups, between approaches for outcome comparison and also compared with the real values in group C*.

From the analysis of the whole-population (part 2), it is surprising to see how only small differences occurred between CC-OR and either of the two imputation methods, at least up to a 10% burden of missing data. On this basis, for iv rtPA trials with a rate of missing data under 10% and sufficient sample size, using CC analysis may be a reasonable option that will simplify trial analysis. At this point, it may be interesting to understand which strategy might better approximate the true population treatment effect; a larger sample size including imputed values, or a smaller sample including only complete data. As our results do not help to clarify this point, specific work on the subject may be needed.

Regarding the trend for the OR under a higher burden of missing data, we saw a progressive increase in the case of the complete case analysis in contrast to a relative stability up to 10% missing burden for LOCF and up to 20% missing burden for MI. There was a relatively steep increase in the higher categories in both cases (equally for both approaches). This may imply that the modelling properties of the LOCF analysis could be lost in populations with a missing data burden above 10% (with a threshold somewhere between 10% and 20%) and for MI in populations with missing data burden above 20% (with a threshold somewhere between 20% and 30%). Even so, the ‘good modelling properties’ of LOCF shown in this study have some attraction, because mRS30 appeared to be the best predictor of mRS90 in our dataset (with a calculated correlation index of 0.9). However, this interpretation may be undermined by the burden of missing mRS30 data that we did not assess, as we had already excluded patients who lacked mRS30. Overall, MI seems to represent a good modelling method for using in stroke trials testing iv rtPA, with a robust performance regardless of the burden of missing data and the approach chosen for outcome comparison. At this point, we should clarify that these results are based on what we consider a reasonable assumption: that missingness in stroke trials occurs within the subset of patients with comorbidities and an a priori poorer prognosis.

The OR obtained using shift analysis were in general lower than those obtained through dichotomisation. This is not surprising: thrombolysis carries a small early risk for haemorrhage and mortality especially among patients with more severe stroke and poor prognosis and thus does not have proportional effects across all categories of mRS (the statistical assumption of proportional odds is not met).²² This does not undermine use of the shift approach, since the OR for shift remains the best measure of treatment effect across the whole population that was selected for treatment.²³

The results of this work apply exclusively for trials testing iv rtPA. How different modelling analysis would perform in other research scenarios such as observational studies or thrombectomy trials remains unknown as far as we are concern.

Conclusions

In the specific context of acute stroke trials testing iv thrombolytics, replacing missing data by carrying forward the last observation tended to overestimate treatment OR more than MI. How this may affect overall treatment estimates will depend on the burden of missing data and the approach taken for outcome comparison, since ordinal analysis was more robust to these influences than dichotomisation.

Footnotes

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: JFF, KRL, MAD, RMI and PDL have nothing to declare. LHS reports serving on the scientific advisory boards for Genentech (TIMELESS study NCT03785678 Steering Committee, and expert advisory panel on late window thrombolysis), Diffusion Pharma DSMB PHAST-TSC NCT03763929 and as a Continuing medical education symposium organiser or lecturer (Medtronic, Boehringer Ingelheim).

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.not-for-profit sectors.

Ethical approval

VISTA has Institutional ethical approval (University of Glasgow, MVLS ethics) for the use of fully anonymised data for novel research purposes.

Informed consent

Informed consent was not sought for the present study because it uses pooled, anonymised data from a clinical trials resource.

Guarantor

KRL.

Contributorship

JFF researched literature, helped structuring the methods, and wrote the first draft; KRL and LHS discussed the original idea, conceived the study; approved the methods and contributed decisively to the final version of the article; RMI and MAD helped with the methods section and performed all the statistical analyses; PDL helped with the methods and article editing.

ORCID iDs

Jose Fernandez-Ferro

Miguel A Descalzo

References

Little

D’Agostino

Cohen

, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med 2012; 367: 1355–1360.

Wood

White

Thompson

SG.

Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials 2004; 1: 368–376.

National Research Council (US) Panel on Handling Missing Data in Clinical Trials. The prevention and treatment of missing data in clinical trials. Washington (DC): National Academies Press, 2010.

Akl

Briel

You

, et al. Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review. BMJ 2012; 344: e2809.

Moher

Hopewell

Schulz

, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol 2010; 63: e1–37.

National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med 1995; 333: 1581–1587.

Hacke

Kaste

Fieschi

, et al. Randomised double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischaemic stroke (ECASS II). Second European-Australasian Acute Stroke Study Investigators. Lancet 1998; 352: 1245–1251.

Clark

Wissman

Albers

, et al. Recombinant tissue-type plasminogen activator (Alteplase) for ischemic stroke 3 to 5 hours after symptom onset. The ATLANTIS Study: a randomized controlled trial. Alteplase Thrombolysis for Acute Noninterventional Therapy in Ischemic Stroke. JAMA 1999; 282: 2019–2026.

Hacke

Albers

Al-Rawi

, et al. The desmoteplase in acute ischemic stroke trial (DIAS): a phase II MRI-based 9-hour window acute stroke thrombolysis trial with intravenous desmoteplase. Stroke 2005; 36: 66–73.

10.

Hacke

Furlan

Al-Rawi

, et al. Intravenous desmoteplase in patients with acute ischaemic stroke selected by MRI perfusion-diffusion weighted imaging or perfusion CT (DIAS-2): a prospective, randomised, double-blind, placebo-controlled study. Lancet Neurol 2009; 8: 141–150.

11.

IST-3 Collaborative Group, Sandercock

Wardlaw , et al. The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial. Lancet 2012; 379: 2352–2363.

12.

Albers

von Kummer

Truelsen

, et al. Safety and efficacy of desmoteplase given 3-9 h after ischaemic stroke in patients with occlusion or high-grade stenosis in major cerebral arteries (DIAS-3): a double-blind, randomised, placebo-controlled phase 3 trial. Lancet Neurol 2015; 14: 575–584.

13.

Mori

Minematsu

Nakagawara

, et al. Safety and tolerability of desmoteplase within 3 to 9 hours after symptoms onset in Japanese patients with ischemic stroke. Stroke. 2015; 46: 2549–2554.

14.

von Kummer

Mori

Truelsen

, et al. Desmoteplase 3 to 9 hours after major artery occlusion stroke: the DIAS-4 trial (efficacy and safety study of desmoteplase to treat acute ischemic stroke). Stroke 2016; 47: 2880–2887.

15.

Logallo

Novotny

Assmus

, et al. Tenecteplase versus alteplase for management of acute ischaemic stroke (NOR-TEST): a phase 3, randomised, open-label, blinded endpoint trial. Lancet Neurol 2017; 16: 781–788.

16.

Albers

Marks

Kemp

, et al. Thrombectomy for stroke at 6 to 16 hours with selection by perfusion imaging. N Engl J Med 2018; 378: 708–718.

17.

Kidwell

Jahan

Gornbein

, et al. A trial of imaging selection and endovascular treatment for ischemic stroke. N Engl J Med 2013; 368: 914–923.

18.

Saver

Goyal

Bonafe

, et al. Stent-retriever thrombectomy after intravenous t-PA vs. t-PA alone in stroke. N Engl J Med 2015; 372: 2285–2295.

19.

Jovin

Chamorro

Cobo

, et al. Thrombectomy within 8 hours after symptom onset in ischemic stroke. N Engl J Med 2015; 372: 2296–2306.

20.

Mocco

Zaidat

von Kummer

, et al. Aspiration thrombectomy after intravenous alteplase versus intravenous alteplase alone. Stroke 2016; 47: 2331–2338.

21.

Park

Davis

CS.

A test of the missing data mechanism for repeated categorical data. Biometrics 1993; 49: 631–638.

22.

Saver

Gornbein

Treatment effects for which shift or binary analyses are advantageous in acute stroke trials. Neurology 2009; 72: 1310–1315.

23.

Savitz

Lew

Bluhmki

, et al. Shift analysis versus dichotomization of the modified Rankin scale outcome scores in the NINDS and ECASS-II trials. Stroke 2007; 38: 3205–3212.