Abstract
Objective
To compare an Extreme Gradient Boosting (XGboost) model with a multivariable logistic regression (LR) model for their ability to predict sepsis after extremely severe burns.
Methods
For this observational study, patient demographic and clinical information were collected from medical records. The two models were evaluated using area under curve (AUC) of the receiver operating characteristic (ROC) curve.
Results
Of the 103 eligible patients with extremely severe burns, 20 (19%) were in the sepsis group, and 83 (81%) in the non-sepsis group. The LR model showed that age, admission time, body index (BI), fibrinogen, and neutrophil to lymphocyte ratio (NLR) were risk factors for sepsis. Comparing AUC of the ROC curves, the XGboost model had a higher predictive performance (0.91) than the LR model (0.88). The SHAP visualization tool indicated fibrinogen, NLR, BI, and age were important features of sepsis in patients with extremely severe burns.
Conclusions
The XGboost model was superior to the LR model in predictive efficacy. Results suggest that, fibrinogen, NLR, BI, and age were correlated with sepsis after extremely severe burns.
Introduction
Extremely severe burns result in widespread damage to skin and tissues, and cause serious physiological and metabolic disturbances that require special treatment and intensive care.1–3 Patients with extremely severe burns have a high risk for developing sepsis and a high mortality rate.4–6 However, there has been little research on risk factors associated with sepsis in patients with extremely severe burns, and the clinical management of these patients continues to pose many challenges.
In recent years, research has shown that certain biomarkers have the potential to predict the development and outcomes of sepsis. For example, indicators such as serum pre-albumin, prothrombin time, and neutrophil to lymphocyte ratio (NLR) have been shown to be useful for predicting and treating sepsis.7–9 However, the pathological process of sepsis that occurs in patients with extremely severe burns differs from that caused by other factors. Therefore, identifying risk factors for sepsis development in patients with extremely severe burns could aid in screening and recognizing populations which are at high risk of sepsis, and ultimately reduce its incidence.
While generalized linear regression (LR) models have been widely used in the past to predict sepsis, they have limitations in extracting variables from high-dimensional data. However, machine learning models can avoid missing important variables in a regression analysis through the use of algorithms. The Extreme Gradient Boosting (XGBoost) model is a powerful machine learning algorithm for risk prediction. It has been successfully applied to research in several medical fields.10–12 In this current study, we retrospectively analysed data from patients with extremely severe burns, and compared an XGBoost model with a multivariable LR model for their ability to predict sepsis after extremely severe burns.
Patients
For this observational study, consecutive patients with extremely severe burns who had been hospitalized at the Red Cross Hospital in Guangzhou, China between January 1st, 2016, and December 31st, 2022 were identified from patient records. Patients included were aged ≥18 years had total body surface area (TBSA) in burns ≥50% or third-degree burns ≥20%. Patients whose sepsis had occurred before admission, had refused treatment within four days of hospitalization, had received corticosteroids or immunosuppressants or, had died, were excluded from the study. Sepsis was defined according to Sepsis-3 Consensus Definitions. 13
The reporting of this study conforms to STROBE guidelines. 14 The study obtained formal approval from the Institutional Review Board of Guangzhou Red Cross Hospital. Written informed consent was not required due to the retrospective design of the study and patient data were anonymized prior to analysis.
Data collection
The following data were extracted from the patients’ medical records: sex; age; hospital admission time; TBSA in burns; third-degree burn area; Burn index (BI), presence of inhalation injury; blood levels of lactate, albumin, creatinine, fibrinogen, red blood cells (RBCs), neutrophils, platelets, lymphocytes, magnesium, monocytes, high-density lipoprotein (HDL), and C-reactive protein (CRP); neutrophil-to-lymphocyte ratio (NLR). Blood samples had been collected within 12 hours after admission and burn area and depth had been assessed by burn surgeons. The BI reflected the severity of the burns, and for this study was equal to the area of third-degree burns plus half the area of the second-degree burns. 15
Statistical analysis
Statistical analysis was performed using SPSS software (version 27.0 for Windows®; IBM Corp, Armonk, NY, USA). A P-value <0.1 was considered to indicate statistical significance. Normally distributed data were presented as a mean ± SD, and independent sample t-tests were used to analyse inter-group differences. Non-normally distributed data were presented as a medians and interquartile ranges, and Mann-Whitney U tests were used to analyse inter-group differences. Categorical variables were presented as absolute counts and as frequencies (%) and χ2 tests were used to compare inter-group differences.
A multivariable LR analysis was performed to identify risk factors for sepsis in patients with extremely severe burns, and regression coefficients, odds ratios (ORs), and 95% confidence intervals (CIs) were calculated. The R rms package was used to plot a nomogram and Hosmer-Lemeshow goodness of fit test was used to evaluate the predictive model. The XGboost model was constructed using the XGboost machine learning algorithm. A shapely additive explanation (SHAP) analysis was used to interpret and analyse the risk factors for sepsis in patients with extremely severe burns to provide a basis for the reliability of the model results. The two models were evaluated using area under curve (AUC) of the receiver operating characteristic (ROC) curve. Youden Index, was used as a summary measure of the ROC curve.
Results
A total of 106 cases were identified, of whom, three patients were excluded. Of the 103 remaining patients, 20 (19%) were identified for the sepsis group, and 83 (81%) for the non-sepsis group (Figure 1). Values for age, admission time >8 hours, area of third-degree burns, BI, fibrinogen, and CRP were all greater in the sepsis group compared with the non-sepsis group (Table 1). However, levels of albumin, lymphocytes and HDL were lower in the sepsis group compared with the non-sepsis group. In the multivariable analysis, we used BI because there was a positive correlation between third-degree burns and BI. In addition, we used NLR as a clinical meaningful index because it has been reported to be predictive for sepsis in its early stages.

Flow chart of the study population.
Demographic characteristics of patients with extremely severe burns.
Data are expressed as, n (%), mean ± standard deviation, or medians (25th, 75th percentiles).
TBSA, total body surface area; RBCs, red blood cells; PLT, platelets; NLR, neutrophil-to-lymphocyte ratio (NLR), HDL, high-density lipoprotein; CRP, C reactive protein; ns, not statistically significant.
The results of a multivariable LR analysis showed that age, admission time, BI, fibrinogen, and NLR were independent risk factors for the occurrence of sepsis (Figure 2). BI was the most significant risk factor for sepsis (P = 0.007) and the risk for sepsis in patients with extremely severe burns admitted within 8 hours after a burn injury was 4.7-fold greater than in patients admitted >8 hours after a burn injury. In addition, patients with extremely severe burns and high fibrinogen levels had a 1.9-fold greater risk of developing sepsis than those with low fibrinogen levels.

A Forest map shows the results of the multivariable logistic regression (LR) analysis. Age, admission time, Body Index (BI), fibrinogen, and neutrophil-to-lymphocyte ratio (NLR) were independent risk factors for the occurrence of sepsis. OR, odds ratio; CI, confidence interval.
A nomogram, constructed to illustrate the risk for sepsis in patients with extremely severe burns, indicated that, as the fibrinogen level increased by 0.5 units, the nomogram score increased by 7.5 points (Figure 3). An ROC curve showed that the AUC of the LR model was 0.88 (95% CI: 0.82–0.95) and the maximum Youden index score was 0.22 (Figure 4). The Hosmer-Lemeshow goodness-of-fit test (i.e., H-L test) was conducted on the predictive model, with χ2 = 13.35 and P = 0.1, demonstrating that the model had a good fit.

Individualized nomogram of the logistic regression (LR) model for the risk of sepsis in patients with severe burns. NLR, neutrophil-to-lymphocyte ratio; BI, body index. *P < 0.05; **P < 0.01.

Receiver operating characteristic (ROC) curve of the logistic regression (LR) model. Area under the curve (AUC) was 0.88, (P < 0.001; 95% CI, 0.82, 0.95). Youden index was 0.22 (95% CI, 0.83, 0.90).
The training set (80% data) was iteratively incorporated into XGboost model to obtain the best parameters. The sensitivity of the XGboost model was 82%, its specificity was 100%, and the AUC of the ROC curve was 0.91 (95% CI: 0.79, 1.0) and the maximum Youden index score was 0.08 (Figure 5).

Receiver operating characteristic (ROC) curve of the XGboost model. Area under the curve (AUC) was 0.91, (P < 0.001; 95% CI, 0.82, 0.95). Youden index was 0.08 (95% CI, 0.82, 1.0).
A SHAP algorithm was used to obtain the importance of each predictor variable to the predicted results of the XGboost model. The variable importance plot lists the most important variables in descending order (Figure 6). Fibrinogen had the strongest predictive value for all predicted levels, followed by TBSA in burns, NLR, BI, and age. In addition, to detect positive and negative relationships between predicted values and target outcomes, SHAP values were applied to show mortality risk factors (Figure 7). The horizontal position on the graph shows whether the effect of the value was associated with a higher or lower prediction, and the colour shows whether the variable was high (red) or low (blue) for that observation. Our analysis showed that an increase in fibrinogen, NLR, BI, age, and admission time had a positive effect and pushed the prediction towards the occurrence of sepsis.

SHAP summary plot of the features of the XGBoost model. The higher the SHAP value of a feature, the higher the probability of a predictive value for sepsis. Fibrinogen had the strongest predictive value for all predicted levels, followed by total body surface area (TBSA) in burns, neutrophil-to-lymphocyte ratio (NLR), body index (BI), and age. PLT, platelets; CRP, C-reactive protein; III, area of third-degree burns; RBC, red blood cells.

Analysis of XGboost model features using SHAP values to show mortality risk factors. An increase in fibrinogen, NLR, BI, age, and admission time had a positive effect and pushed the prediction towards the occurrence of sepsis. SHAP, shapely additive explanation analysis; TBSA, total body surface area in burns; NLR, neutrophil-to-lymphocyte ratio; TBSI, total burn surface involved; PLT, platelets; CRP, C-reactive protein; III, area of third-degree burns; RBC, red blood cells.
Discussion
Sepsis is a common complication in patients with extremely severe burns, and can lead to septic shock, immune suppression, multiple organ dysfunction, and even death.4–6 Therefore, preventing sepsis in patients with extremely severe burns is of great clinical importance.16–19 In this present study, the incidence rate of sepsis among 103 severely burned patients, was 19%. Identifying risk factors for sepsis in these patients will undoubtedly lead to significant improvements in their outcome. We showed that the XGboost model was superior to the LR model in its predictive efficacy as assessed by AUC of the ROC curves. Common risk factors screened included, fibrinogen, NLR, BI, and age. Previous studies have also found that age was a factor affecting the prognosis of sepsis in patients with extremely severe burns, and was positively correlated with burn mortality. 20
In our study, the LR model showed that BI was the most significant risk factor for sepsis in patients with extremely severe burns, which was consistent with findings from a similar study. 21 In addition, the LR model showed that the risk for sepsis in patients with extremely severe burns admitted within 8 hours after a burn injury was nearly five times greater than in patients admitted >8 hours after a burn injury. The SHAP interpreter of the XGboost model also showed that the longer the length of admission time after a burn injury, the higher the risk for sepsis developing in the patient. Indeed, admission time following a burn injury is an important risk factor, and timely treatment of a burn injury, including effective fluid resuscitation, anti-shock therapy, early burn wound management, infection control, and organ protection, is important in preventing sepsis.22,23
Fibrinogen is a protein in the liver composed of three pairs of peptide chains, and plays an important role in the coagulation/anticoagulation process. Fibrinogen is critical for blood clotting, and when its level is elevated, it may increase blood coagulation and become an important risk factor for thrombotic disease. Studies have found that fibrinogen has important diagnostic and predictive value in various diseases, such as cardiovascular disease, cancer, and diabetes.24–26 Elevated fibrinogen levels are an independent risk factor for death after sepsis, suggesting that elevated fibrinogen levels may have prognostic value for sepsis.27,28 The results of this current study showed that patients with extremely severe burns and high fibrinogen levels had nearly a two-fold greater risk of developing sepsis than those with low fibrinogen levels. However, the pathological mechanism by which elevated fibrinogen levels increase the risk for sepsis remains unclear.
NLR reflects the dynamic balance between neutrophils and lymphocytes in peripheral blood and in recent years has received attention as a new type of inflammatory index. For example, in a disease state, changes in NLR may reflect the severity and prognosis of the disease.29–31 In addition, NLR is considered to be a reliable biomarker for diagnosing bacteraemia and sepsis.32–34 Studies have shown that a high NLR was associated with a poor prognosis in sepsis patients and was an independent risk factor for predicting sepsis patient mortality. 35 The results of our study showed that NLR was an independent risk factor for sepsis in severely burned patients, and so provides a new direction for future treatment strategies.
The study had several limitations. For instance, its sample size was small and only data from 103 patients were included which may have led to unintentional bias. Additionally, the study was conducted at a single data centre in China, which may limit the generalizability of its findings. Importantly, although the study found that fibrinogen, BI, age, and NLR were high-risk factors for sepsis in patients with severe burns, the clinical significance of these findings has not been fully explored. Despite these limitations, the findings of this present study are important and may provide new directions for future research and clinical applications. Further, large scale, prospective, randomised, studies are required to confirm our results.
In summary, this study showed that the XGboost model was superior to the LR model in predictive efficacy. The results suggest that, fibrinogen, NLR, BI, age and time of admission were correlated with sepsis after extremely severe burns. The appropriate management of patients with abnormal indices upon admission may help to reduce their risk of sepsis.
Footnotes
Acknowledgements
We wish to thank our colleagues in the Departments of burns and plastic surgery for their cooperation with this study.
Declaration of conflicting interests
The authors declare that there are no conflicts of interest.
Funding
This work was supported by research grant 202102020481 (P.L.) from Guangzhou Science and Technology Plan Project, research grant 202102010074 (T.Z.) from Guangzhou Science and Technology Plan Project School (Institute) Joint Funding Project, research grant RHPG05 from Guangzhou Research Hospital Construction Project, research grant 2023A03J0524 (Y.H.H.) from Guangzhou Science and Technology Plan Project School (Institute) Joint Funding Project, and research grant from 2018 Project Funds of Guangzhou Red Cross Hospital.
