Abstract
Introduction
Controversies remain on whether post-stroke complications represent an independent predictor of poor outcome or just a reflection of stroke severity. We aimed to identify which post-stroke complications have the highest impact on in-hospital mortality by using machine learning techniques. Secondary aim was identification of patient’s subgroups in which complications have the highest impact.
Patients and methods
Registro Nacional de Ictus de la Sociedad Española de Neurología is a stroke registry from 42 centers from the Spanish Neurological Society. Data from ischemic stroke patients were used to build a random forest by combining 500 classification and regression trees, to weight up the impact of baseline characteristics and post-stroke complications on in-hospital mortality. With the selected variables, a logistic regression analysis was performed to test for interactions.
Results
12,227 ischemic stroke patients were included. In-hospital mortality was 5.9% and median hospital stay was 7(4–10) days. Stroke severity [National Institutes of Health Stroke Scale > 10, OR = 5.54(4.55–6.99)], brain edema [OR = 18.93(14.65–24.46)], respiratory infections [OR = 3.67(3.02–4.45)] and age [OR = 2.50(2.07–3.03) for >77 years] had the highest impact on in-hospital mortality in random forest, being independently associated with in-hospital mortality. Complications have higher odds ratios in patients with baseline National Institutes of Health Stroke Scale <10.
Discussion
Our study identified brain edema and respiratory infections as independent predictors of in-hospital mortality, rather than just markers of more severe strokes. Moreover, its impact was higher in less severe strokes, despite lower frequency.
Conclusion
Brain edema and respiratory infections were the complications with a greater impact on in-hospital mortality, with the highest impact in patients with mild strokes. Further efforts on the prediction of these complications could improve stroke outcome.
Keywords
Introduction
Stroke represents the fifth cause of death, causing one of every 20 deaths in United States. On average, someone dies of stroke every four minutes. 1 In-hospital mortality rates for ischemic stroke have been estimated between 11 and 15%. 2 Prediction of outcome and mortality after stroke represents a challenge for clinicians. Many predictive scores for in-hospital mortality and long-term disability, integrated by baseline clinical variables, have been published, having some of them high rates of accuracy.3,4 However, none of them is currently recommended by stroke guidelines to be translated into clinical decisions, nor used in clinical practice except for research purposes.
Among the factors that account for this high rate of mortality, it has been estimated that baseline, non-modifiable factors account for approximately 2/3 of in-hospital deaths, being age and baseline stroke severity the most relevant ones. The remaining third is determined by conditions that may occur after stroke and leads to poor outcome, namely post-stroke complications. 5 Examples of these factors would be brain edema, stroke-associated infections, cardiologic complications, seizures or those complications related to reperfusion therapies. 6 It seems reasonable that preventive strategies or early management of these complications may result in an improved outcome, as excellent opportunities to impact stroke outcome. However, still some controversies exist on whether these complications independently lead to poor outcome by itself or just as a reflection of stroke severity. 7
Machine-learning techniques have several advantages over other statistical approaches. Random forest analysis is one of the most accurate learning algorithms in the analysis of large datasets, providing a precise estimation of the importance of each variable, being more reliable than traditional statistic methods. 8 In this study, we aimed to identify which post-stroke complications have the highest impact on in-hospital mortality by using machine learning techniques. Secondary aim was the identification of subgroups of patients in which these complications have the highest impact.
Methods
The National Stroke Registry of the Spanish Neurological Society (Registro Nacional de Ictus de la Sociedad Española de Neurología-RENISEN) is an electronic database which involves a total of 42 primary stroke centers in Spain (Appendix), promoted by the Cerebrovascular Diseases Study Group of the Society. Specific objectives of the project are (i) to facilitate the performance of specific epidemiological studies with large sample sizes and (ii) to facilitate an individual database for each participating center.
Patients and characteristics
RENISEN registry 9 includes data from stroke patients admitted to the participating hospitals. Stroke diagnosis represents the only inclusion criterion, without any patient selection. Past medical history is collected on admission for demographic data, vascular risk factors (hypertension, diabetes mellitus, dyslipidemia, heart diseases and previous stroke), medications and functional status. Pre-stroke functional status is assessed with the modified Rankin Scale (mRS), dichotomized as previously independent (mRS 0–2) or dependent (mRS 3–5). Baseline stroke severity is assessed at hospital admission with the National Institutes of Health Stroke Scale (NIHSS). A second assessment of the NIHSS is recorded at 24 hours after admission. Just baseline NIHSS was used in further analyses.
Stroke diagnosis is made according to the WHO criteria and defined as ischemic or hemorrhagic by neuroimaging. Only ischemic stroke patients were included in this study. Stroke syndrome is classified according to the Oxfordshire Stroke Classification Project Criteria, 10 and etiology is assessed according to the Trial of Org 10172 in Acute Stroke Treatment. 11
During hospital admission, data on neuroimaging, acute phase therapies and post-stroke complications are recorded. Post-stroke complications are divided into neurological and systemic complications. Symptomatic hemorrhagic transformation (sICH) is defined according to the ECASS criteria (any intracerebral hemorrhage in follow-up neuroimaging within 36 hours associated with neurological deterioration). 12 Brain edema is considered if midline shift is present in follow-up neuroimaging. Early stroke recurrence and seizures are also recorded. Neurological deterioration of unknown origin is diagnosed if an increase on the NIHSS score of four or more points is noted in the absence of other complication. Regarding systemic complications, respiratory, urinary and other infections, acute congestive heart failure, acute coronary syndrome and venous thromboembolism are recorded. Cardiologic complications were recorded according to treating physician diagnosis, normally based on AHA guidelines. 13 Therefore, ACS was considered if typical chest pain and/or ECG abnormalities (ST-segment elevation or new left bundle-branch block on the ECG) and/or enzymatic alterations (elevated Tn and/or creatin kinase MB levels) were present. AHF was diagnosed as new-onset dyspnea and/or fatigue, together with fluid retention, which may lead to pulmonary and/or splanchnic congestion and/or peripheral edema, in the absence of other documented causes of dyspnea. Due to a small incidence, cardiologic complications including acute congestive heart failure and acute coronary syndrome were analyzed together. No predefined criteria were used for infections definition, but they are recorded by physician diagnosis.
At hospital discharge, stroke severity assessment with the NIHSS and functional outcome with the mRS is recorded. A complete description of the RENISEN databank and the variables is available in the Users’ Handbook. 14
Statistical analyses
Demographic and descriptive data were presented as number (percentage), as mean ± standard deviation (SD) or median [interquartile range (IQR)], depending on data distribution assessed with the Kolmogorov–Smirnov test. Univariate analyses were performed considering the primary outcome of in-hospital mortality as dependent variable. Comparisons between patients’ groups were performed with the Chi-squared test for categorical variables, and Student’s t or Mann–Whitney U tests for continuous variables.
All variables associated with in-hospital mortality at a p value < 0.05 in the univariate analysis were included in a random forest, to show the most informative variables in relation to mortality. This method is obtained by the combination of multiple classification and regression trees (CARTs), which are obtained by partitioning the dataset and fitting a simple prediction model within each partition/node. The partitioning can be represented graphically as a decision tree. The random forests technique examines a large ensemble of decision trees, by generating a random sample of the original data with replacement (bootstrapping), and using a user-defined number of variables selected at random from all of the variables to determine node splitting. Multiple subsets of trees are built, and the support for the role of each variable in each decision is noted. In this analysis, 500 CARTs were performed. The most informative variables are selected by its highest mean Gini decrease,8,15 a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest. Each time a particular variable is used to split a node, the Gini coefficient for the child nodes is calculated and compared to that of the original node. This analysis was performed using R software, v 2.15.0 (R Development Core Team 2012; Vienna, Austria), ‘randomForest’ library. 16
Variables identified in the random forest as the best predictors of in-hospital mortality were introduced as covariates into a binary logistic regression analysis. Continuous variables were included into the model dichotomized by the cut-off with the highest accuracy, selected in Receiver Operating Characteristic (ROC) curves. The model was tested by all potential interactions between these covariates by the backward-Wald method. When a significant interaction was noted, stratified ORs were given. For random forest and logistic regression analyses, patients with any missing value in the variables of interest were excluded. Baseline characteristics of patients excluded for missing values were compared with those included in the analysis.
The accuracy of the predictive model was evaluated with the area under the ROC curve developed with the predictive probabilities of the model (C-statistics). In addition, Kaplan–Meier curves were depicted for the complications identified as having highest impact on in-hospital mortality in random forest analysis, to look for differences in survival depending on the occurrence of post-stroke complications.
Results
Characteristics of the RENISEN population.
CAD: coronary artery disease; mRS: modified Rankin Scale; NIHSS: National Institutes of Health Stroke Scale; TOAST: Trial of Org 10172 in Acute Stroke Treatment; LAA: large artery atherothrombotic; CE: cardioembolic; LAC: lacunar; UND: undetermined; OCSP: Oxfordshire Classification Stroke Project; TACI: total anterior circulation infarct; PACI: partial anterior circulation infarct; LACI: lacunar infarct; POCI: posterior circulation infarct; IV tPA: intravenous tissue plasminogen activator; sICH: symptomatic intracerebral hemorrhage; DVT/PE: deep venous thrombosis/pulmonary embolism.
Values are given as median (IQR) for continuous variables and in number of cases over the total cases (excluding missing values) and percentage for categorical variables.
Baseline characteristics, in-hospital complications and outcome of the RENISEN population according to vital status.
RENISEN: Registro Nacional de Ictus de la Sociedad Española de Neurología; CAD: coronary artery disease; mRS: modified Rankin Scale; TOAST: Trial of Org 10172 in Acute Stroke Treatment; LAA: large artery atherothrombotic; CE: cardioembolic; LAC: lacunar; UND: undetermined; NIHSS: National Institutes of Health Stroke Scale; OCSP: Oxfordshire Classification Stroke Project; TACI: total anterior circulation infarct; PACI: partial anterior circulation infarct; LACI: lacunar infarct; POCI: posterior circulation infarct; sICH: symptomatic intracerebral hemorrhage; DVT/PE: deep venous thrombosis/pulmonary embolism.
In the random forest plot, stroke severity, brain edema, age and respiratory infections were the variables identified as more explicative for in-hospital mortality, all of them above the cut-off of the mean importance of the model (42.28). Symptomatic hemorrhagic transformation and cardiologic complications were also explicative, although their mean Gini decrease was under the cut-off, in the range of other variables such as stroke unit admission, dyslipidemia or diabetes (Figure 1). As an example, one of the CARTs integrating the Random Forest plot is shown in Figure 2.
Random forest. The points represent the mean decrease Gini value, indicative of the importance of each variable, and the discontinuous vertical line represents the mean importance of the model (43.28), correspondent to the mean value of the importance of each variable. Example of one of the classification and regression trees (CARTs) for prediction of in-hospital mortality. Black squares show the variables dividing the dataset and white squares show the predicted in-hospital mortality rate at each node. Sample size in each division of the dataset is also shown. The terminal nodes with the highest and lowest rates for prediction of in-hospital mortality are marked in gray.

Logistic regression predictive models for in-hospital mortality.
CI: confidence interval; NIHSS: National Institutes of Health Stroke Scale; sICH: symptomatic intracerebral hemorrhage; ACS: acute coronary syndrome.
Patients with any missing data in the variables associated with in-hospital mortality were not included to develop the random forest (N = 2,816). For this reason, they were compared with included patients. As a result of the large sample size, some variables were significantly different between cohorts for both comparisons, although no big differences in terms of values were observed in the main variables of interest. Mortality rates did not differ (Supplemental Table I).
Kaplan–Meier curves depicted for the abovementioned complications found that brain edema was associated with early mortality, while other complications such as respiratory infections, sICH or cardiac complications were found related to later events (Figure 3). Supplemental Table II shows survival rates referred to median survival.
Kaplan–Meier curves with survival rates for the most important complications identified in the random forest. Extreme survival values (>30 days) were removed.
Discussion
In the present study, we investigated the impact of post-stroke complications on in-hospital mortality in a large stroke registry from south Europe. Brain edema and respiratory infections were identified as the main post-stroke complications, providing together with age and baseline stroke severity, accuracy rates close to 90% for the prediction of in-hospital mortality. From the remaining complications, sICH and cardiologic complications were also identified as possible predictors, although their importance had less impact. Our results are in line with those from the Berlin Stroke Register investigators,5,17 which identified brain edema and respiratory infections as the main causes of death depending on length of stay. Our study, with similar rates of mortality, replicates these results, in a different population, composed by ischemic stroke patients solely and using a different statistical approach. Beyond the novelty of using machine learning techniques, the results of the stratified analysis, which indicates that the impact of post-stroke complications is higher in patients with NIHSS < 10 than in the most severe strokes, represent the main novel finding of our study.
Age and baseline NIHSS score have been recognized as the main baseline characteristics conditioning stroke outcome. 18 The simple sum of these two variables was explored as a predictive scale in the NINDS-tPA trial cohort, which included 624 patients. SPAN-100-positive patients (age + NIHSS ≥ 100, N = 62 participants, 9.9%) had higher rates of intracerebral hemorrhages and lower rates of favorable composite outcome at 3 months, according to the NINDS-tPA trial definition (mRS score of 0–1, NIHSS ≤ 1, Barthel index ≥ 95, and Glasgow Outcome Scale score of 1 at 3 months). 19
Malignant brain edema has been recognized as the complication with the highest attributable risk of death for patients with length of hospital stay ≤7 days, 5 and is known to cause a high mortality rate, close to 80% in the absence of specific treatment. 20 Fortunately, a sensitive and specific tool to predict the development of brain edema, measuring the infarct volume on diffusion weighted imaging MRI, 21 already exists. Moreover, hemicraniectomy has demonstrated to significantly reduce mortality in young patients with brain edema. 22
Some controversy exists on whether post-stoke respiratory infections are independent predictors of poor outcome after stroke. Some studies have considered infections as the main specific cause of death after the first week 5 and as responsible for the highest attributable proportion of death in acute ischemic stroke, accounting for one-third of all deaths. 23 On the other hand, some studies suggested that post-stroke respiratory infections are just markers of stroke severity, rather than independent determinants of outcome. 7 Our results strongly argue against this, since post-stroke respiratory infections had a major impact on in-hospital mortality, being among the main predictors in the random forest and maintaining this independent association when adjusting by interactions, except for patients with malignant brain edema. In this sense, the feature of a protective effect in this population could be in relation with the low survival rates of patients with brain edema shown in the Kaplan–Meier curves, in which the development of respiratory infections is difficult.
Two recent clinical trials have shown that preventive antibiotic therapy is not useful for improvement on functional outcome in stroke, neither for pneumonia reduction, despite an overall reduction in the global infection rate.24,25 With these results, the identification of these patients in which suffering an infection could have the greatest impact on stroke outcome is of interest. Some predictive scores have been published, all of them including a high baseline NIHSS score as a risk factor for pneumonia.26,27 Our study, however, suggests that patients with lower baseline NIHSS scores, and not suffering from other complications such as brain edema, could be the population in which the overcoming of a respiratory infection could have a major impact on mortality, as higher odds ratios for respiratory infections were noted in the stratified analyses in these subgroups of patients. However, given the expected low rates of pneumonia in this population, the use of additional tools such as inflammatory of immunological blood biomarkers should be considered to select patients for future studies. 28
Hemorrhagic transformation is the most feared complication after reperfusion therapies and has an impact on stroke outcome even in the absence of reperfusion therapies. 29 In the present study, perhaps the impact of sICH on stroke mortality is lower than expected, even with more than 25% of patients treated with reperfusion therapies. At least in this RENISEN cohort, brain edema and respiratory infections gain significance at the expense of hemorrhagic transformation. Finally, cardiologic complications have been identified as complications with a significant impact on stroke mortality, accounting for 2–6% mortality within the first 3 months, being this risk higher within the first two weeks after the cerebrovascular event.30,31 In our study, the impact on in-hospital mortality of this feared complication was present, although not as high as expected, perhaps in relation with a low incidence of 2.2%, which is lower than the previously reported. 32 However, as shown in the CART, this complication could have the greatest impact among selected subgroups, such as patients with low NIHSS scores and not suffering from other complication, such as brain edema or respiratory infections. Our group has recently described the usefulness of a single ultrasensitive Troponin I measurement in the acute phase of stroke to screen patients at the highest risk of these complications, 33 which might be used, together with the abovementioned clinical factors, to select patients at the highest risk for longer monitoring or preventive therapies.
An important issue in the field of post stroke complications is the standardization of stroke care in the stroke units, which has demonstrated a clear benefit in terms of decreased mortality and improved functional outcome after stroke, 34 being the reduction of post-stroke complications one of the main mechanisms responsible for this benefit. 35 Of the patients included in this study, approximately two-thirds were treated in stroke units, reason that could be responsible for the low rate of post stroke complications, together with the relatively low baseline stroke severity. Increasing the proportion of stroke patients treated in stroke units should be translated into better results in terms of lower rates of mortality and complications.
A large sample size and an accurate and novel statistical analysis are the main strengths of the present study. Our study has also some limitations. First, the absence of follow up at longer time-periods in the registry makes impossible to know the effect of these complications on long-term functional outcome and mortality, questions that should be answered by future studies. Second, some patients were excluded from analyses due to missing values. Although there were no important differences in the main characteristics between the included and excluded cases, we cannot discard an effect of this missing rate on our results. Third, although the WHO definition was recommended in the Users’ Handbook, we cannot control whether some patients fulfilling criteria for TIA according to the WHO definition but with tissue abnormalities could have been classified as TIAs, therefore contributing to the relatively low stroke severity observed. Fourth, some factors such as those related with reperfusion therapies, i.e. recanalization or reocclusion of the occluded artery or time to treatment were not registered in the registry and therefore were not explored. Fifth, as based on complications that occurs at different time-points from stroke onset, our predictive model is not easy to be translated to clinical practice in the way of a predictive score. However, our main aim in this study was not the creation of just another predictive score, but the identification of the complications with the highest impact on post-stroke in-hospital mortality and the subset of patients in which prevention or early treatment of these complications could be more relevant.
Conclusions
In conclusion, the stroke-associated complications with a greater impact on stroke in-hospital mortality were brain edema and respiratory infections, followed by sICH and cardiological complications. The highest impact of post-stroke complications on in-hospital mortality was detected among patients with lower baseline NIHSS scores. Further efforts on the prediction or early diagnosis of these complications for their prevention or early treatment should be done, especially in the subgroup of patients with low-moderate stroke severity. In the near future, designing specific clinical trials for each of these complications might be an interesting alternative in order to identify new therapies that might improve stroke outcome.
Footnotes
Acknowledgements
Neurovascular Research Laboratory and Institut d'Investigació Biomèdica de Girona took part in the Spanish stroke research network INVICTUS (RD12/0014/0005).
Declaration of Conflicting Interests
The authors declare that there is no conflict of interest.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by FIS PI15/354, cofinanced by the European Regional Development Fund (FEDER). A.B. is supported by a Rio Hortega contract CM/00265 from the Instituto de Salud Carlos III.
Ethical approval
The ethics committee of Hospital Josep Tueta approved this study.
Informed consent
Informed consent was obtained from the patients for their anonymized information to be published in this article.
Guarantor
JM.
Contributorship
AB designed the study, obtained data, performed and interpreted statistical analysis and drafted the manuscript. DG managed databases, performed and interpreted statistical analyses and critically reviewed the manuscript. TG-B interpreted statistical analysis and critically reviewed the manuscript. MR, JA-S and CM coordinated inclusion at Vall d’Hebron Hospital and critically reviewed the manuscript. JS was the principal investigator of the RENISEN project. He interpreted statistical analysis and critically reviewed the manuscript. JM conceived and designed the study and critically reviewed the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
