Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients

Abstract

Objective

To develop a machine learning-driven predictive model for early identification of viral encephalopathy risk in SFTS patients.

Materials and methods

This retrospective study included 192 SFTS patients (58 with viral encephalopathy) from Nanjing Second Hospital (June 2022–December 2024). Boruta and SHAP-RFE-CV identified nine predictors, refined via LASSO regression (λ.1se=0.018). Multivariate logistic regression analyzed risk factors, constructing a dynamic nomogram. Model performance was validated against APACHE II using AUC, calibration curves, and DCA. PCA explored biomarker interactions.

Results

Key predictors included viral load, LDH, BNP, IL-8, APTT, and CD4⁺ T cells. Independent risk factors were LDH (OR=1.18), BNP (OR=3.85), IL-8 (OR=8.97), prolonged APTT (OR=1.54), while CD4⁺ T cells were protective (OR=0.55). The nomogram outperformed APACHE II (training AUC=0.958 vs. 0.839; validation AUC=0.974 vs. 0.926). Calibration curves (Hosmer-Lemeshow P=0.48/0.123) and DCA confirmed clinical utility. PCA identified three axes: tissue injury (PC1: LDH/AST, 53.3% variance), inflammation/coagulation (PC2: IL-8/APTT, 13.9%), and immune dysregulation (PC3: CK, 11.6%). IL-8 exhibited a nonlinear threshold effect (cutoff=90.1 pg/mL).

Discussion

The nomogram integrates dynamic biomarkers to address multicollinearity, outperforming traditional scores. PC1-PC2 axes explained 67.2% of encephalitis variance, highlighting tissue damage and inflammatory-coagulation dysregulation as central mechanisms.

Conclusion

This study establishes a clinically actionable nomogram for SFTS-associated encephalopathy risk stratification. PCA insights reveal mechanistic interactions, offering novel therapeutic targets.

Keywords

severe fever with thrombocytopenia syndrome (SFTS)viral encephalopathy machine learning prediction model biomarkers risk stratification

1. Introduction

Severe fever with thrombocytopenia syndrome (SFTS), caused by SFTSV, is a life-threatening tick-borne disease with frequent neurological complications like encephalitis.¹ Recent bibliometric analyses have systematically mapped the global trends and hotspots in SFTSV clinical research, identifying antiviral therapy, immunotherapy, and virus transmission mechanisms as the primary focuses of current investigations.² Early prediction of SFTS-associated encephalopathy remains challenging due to the lack of specific prognostic tools.³ Current models, such as APACHE II, show limited specificity and fail to capture dynamic biomarker interactions critical to disease progression.^4,5

Existing prognostic models often rely on univariate or traditional multivariate analyses, which may overlook complex feature dependencies.⁶ For instance, lactate dehydrogenase (LDH) and viral load (VL) are established prognostic markers, but their synergistic effects with coagulation (e.g., APTT) and inflammatory cytokines (e.g., IL-8) remain unclear. Additionally, most studies neglect advanced feature selection algorithms like Boruta or SHAP-based recursive elimination, which mitigate overfitting and improve model precision.^7–10

To address these gaps, we developed a hybrid machine learning strategy combining Boruta and SHAP-based recursive feature elimination (RFE-CV) to identify key predictors (e.g., VL, LDH, CD4⁺ T cells). Principal component analysis (PCA) was further employed to decode latent biomarker interactions, revealing three principal components reflecting underlying patterns of biomarker covariation: tissue injury (LDH/AST-driven), inflammation-coagulation crosstalk (IL-8/APTT-linked), and immune dysregulation.

Compared to the APACHE II score, our dynamic nomogram exhibited superior discrimination (AUC: 0.974 vs. 0.926) and clinical utility. PCA identified PC1 (OR=93.1) and PC2 (OR=6.34) as significant encephalitis risk factors. This study provides a translational tool for precision risk stratification and mechanistic insights into SFTS-associated encephalopathy.

2. Materials and methods

2.1. Participants

This single-center retrospective cohort study included 192 SFTS patients diagnosed via qRT-PCR at Nanjing Second Hospital (June 2022–December 2024). Viral encephalopathy was defined as SFTS combined with neurological symptoms¹: muscle tremors (tongue/jaw/limbs)²; cognitive dysfunction (orientation/memory deficits, slowed reactions, aphasia)³; altered consciousness (drowsiness/coma); or⁴ seizures. Patients were randomly split into training (70%) and validation (30%) sets. They were further divided into a nonencephalitic group and an encephalitic group on the basis of the presence or absence of viral encephalopathy. All patients met the following criteria: laboratory-confirmed SFTSV infection (positive for serum viral RNA), age ≥18 years, and complete clinical data. The exclusion criteria were other causes of central nervous system diseases (e.g., stroke, brain tumor, epilepsy, etc.); concurrent active viral infections (e.g., HIV, HBV, HCV, etc.); and missing clinical data >20%. To handle missing data in the dataset, we employed the Multiple Interpolation via Chained Equations (MICE) method, a flexible approach that sequentially interpolates missing values based on the conditional distributions of each variable. The interpolation was performed using the ‘mice’ package. To ensure reproducibility of the results, we set the random seed to 123 and generated five interpolated datasets (m=5). The interpolation model was automatically selected by the ‘mice’ function based on variable type: the mean-matching method was used for continuous variables, while logistic regression (or polynomial logistic regression) was applied for categorical variables. After interpolation, we used the ‘complete’ function (with the default ‘action = 1’) to extract the first complete dataset. This fully interpolated dataset retains the distributional characteristics of the original data and incorporates the inherent uncertainty of the interpolation process, making it suitable for subsequent statistical analysis.

2.2. Data collection

Clinical and laboratory data collected within 24 hours of admission included: Demographics: age, sex, and underlying conditions (hypertension, diabetes, cancer, etc.). Symptoms: fever spikes, neurological deficits (tremors, cognitive impairment, seizures), and consciousness changes, etc. Lab markers:

Inflammatory: CRP, PCT, IL-6/8/10/IFN-γ.

Liver: ALT, AST, TBil, ALB.

Cardiac: BNP, LDH, CK/CK-MB/cTnI.

Coagulation: PT, APTT, FIB, FDP, D-dimer.

Immune: CD4⁺/CD8⁺ T cells, NK cells.

Virological: SFTSV viral load (VL).

Severity scores: APACHE II and RISK mortality risk assessment.

2.3. Variable selection and model creation

To build robust predictive models and analyze the relationships between variables, we used a mix of traditional statistical methods and machine learning.

1. Univariate and multivariate logistic regression analyses: These traditional survival analysis methods were used to identify significant predictors of mortality. They provide explainable results and are used to construct nomograms.

2. The Boruta algorithm efficiently identifies truly meaningful features through a clever shadow variable competition mechanism and rigorous statistical validation, thereby enhancing model interpretability and generalizability.

3. SHAP value-driven recursive feature elimination (RFE-CV) uses SHAP values to quantify the contribution of features to the model output, recursively eliminates features with the lowest SHAP contributions, and directly optimizes feature subsets. Combined with cross-validation (CV) to repeatedly validate feature importance, this reduces randomness bias and ensures the robustness of the results.

4. LASSO regression analysis is particularly useful for datasets with many correlated variables. It performs variable selection by shrinking less important coefficients to zero, resulting in a more concise model. This method is beneficial for handling multicollinearity and improving model interpretability.

5. PCA offers a powerful unsupervised method with advantages in covariance structure, orthogonality, dimension reduction, noise/redundancy reduction, and visualization of high-dimensional relationships, making it commonly used to reveal and simplify the core relationship structure inherent in multivariate data.

By comparing these methods, we aim to select the optimal model that balances predictive accuracy and clinical interpretability. Dynamic nomograms are constructed using the most important predictive factors identified through these analyses.

First, in the training set, variables with statistically significant differences (P < 0.05) or those considered clinically meaningful between the death group and the survival group at baseline were identified through univariate analysis and included in the Boruta algorithm. Second, the variables not rejected by the Boruta algorithm were further analyzed and screened via the SHAP value-driven recursive feature elimination (RFE-CV) method. Finally, a multivariate logistic analysis combined with variance inflation factor (VIF) analysis was used to select all variables (inclusion criteria: P < 0.05 or VIF < 5) for constructing a dynamic scatter plot. Correlation and mediation analyses were performed on each variable, and ANOVA, AIC, and BIC analyses were used to compare the performance of the selected key variable model and the full variable model to select the optimal model. Unsupervised PCA was then performed on the above variables.

2.4. Verification of the dynamic nomogram

In the training and validation sets, the discriminatory performance of the APACHE II score was compared via AUC-based receiver operating characteristic (ROC) curves. Calibration curves were used to assess the accuracy of the model predictions. Decision curves were employed in the validation set to provide critical evidence of whether the model possesses practical clinical decision-making improvement capability to surpass its statistical performance.

2.5. Exploration of the relationships between key factors via principal component analysis (PCA)

Correlation analysis was performed on variables and mortality risk coefficients in the ROC curve model, and variables with strong correlations (r absolute value > 0.5) were included in the PCA to identify potential structures. After standardizing the predictive variables, the optimal number of principal components was selected on the basis of cumulative explained variance > 70% and eigenvalue magnitude. That is, when the cumulative variance approaches a critical threshold (e.g., 70–80%),

1. If removing components with eigenvalues slightly <1 causes a significant decrease in cumulative variance (>10%), the component should be retained.

2. If components with eigenvalues between 0.9 and 1.1 have no practical significance, they can be removed.¹¹ Logistic regression was used to assess the association between principal component scores and encephalopathy, with the results reported as adjusted odds ratios (per 1 standard deviation increase). Preliminary exploration of the relationships between key variables in the principal components was conducted on the basis of variables with absolute loadings >0.4 or clinical logical relationships.

2.6. Statistical analysis

Statistical analyses were performed in R (v4.3.3). Sample size calculation: Based on experience, at least 5–20 events are required for each independent variable in a multivariate analysis. Categorical data were analyzed via χ² tests; continuous data with normal distribution used independent t-tests (mean ± SD), while non-normal data employed Mann-Whitney U tests (median [P25, P75]). Fisher’s exact test complemented χ² for small samples. Variables were screened via Boruta/SHAP-RFE-CV. LASSO regression (λ.1se) optimized variable selection via 10-fold cross-validation. Model performance was evaluated by AUROC, calibration curves, and decision curves. PCA identified latent structures (cumulative variance >70%). Restricted cubic splines analyzed variable relationships.

All the statistical tests were two-sided, with a statistical significance level set at a p value of < 0.05. The main packages R of the study design are gtsummary, tidyverse, dplyr, rms, dcurves, pROC, the nomogram Formula and ggplot2. All analyses were reported according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines.¹²

3. Results

3.1. Clinical baseline information

This study ultimately included 192 patients with SFTS, among whom 58 (30.2%) developed viral encephalopathy. There were no significant differences between the training set (n=134) and the validation set (n=58) in terms of demographic characteristics, underlying conditions, or laboratory indicators, indicating balanced baseline characteristics between the two groups (Supplemental Figure 1, Supplemental Table 1). A comparison between the encephalitis group and the nonencephalitis group revealed that encephalitis patients had higher viral loads (VLs) (P<0.001), LDH levels (P<0.001), BNP levels (P<0.001), and IL-8 levels (P<0.001), whereas CD4⁺T-cell counts were significantly lower (P<0.001) (Supplemental Table 2).

3.2. Prediction model feature selection

To identify potential risk factors for encephalopathy in patients with severe fever with thrombocytopenia syndrome, we initially screened 30 candidate variables through univariate analysis coupled with the Boruta algorithm. This process classified 26 variables (including PLT and CD4⁺ T cells) as significant predictors, whereas 4 variables (CRP and CD8⁺ T cells) remained tentative, with others being excluded (Figure 1(a)). We then applied SHAP value-driven recursive feature elimination to refine the selection, ultimately identifying 9 optimal features (VL, urea, AST, etc.) that yielded peak cross-validation accuracy (0.886; Figure 1(c) and (d). Variance inflation factor analysis confirmed minimal multicollinearity among these variables (all VIFs <5; Figure 1(d)). Through LASSO regression modeling, we established the optimal predictive framework at λ.1 se=0.018, which retained 7 clinically relevant variables (VL, LDH, CD4⁺T, etc.; Figure 1(e) and (f)). These findings highlight VL, LDH, and CD4⁺T as pivotal biomarkers for encephalopathy risk stratification in this patient population.

Figure 1.

Machine learning-driven biomarker pipeline for predicting viral encephalopathy in SFTS patients.

3.3. Constructing models

To identify the risk factors for encephalopathy development in fever with thrombocytopenia syndrome patients, we conducted a comprehensive multivariate logistic regression analysis evaluating seven key laboratory markers. Continuous variables included in the multivariate analysis were clinically standardized (LDH and AST were converted to units per 100 U/L, and APTT was converted to units per 10 seconds) to ensure that the odds ratios (ORs) were interpretable.Our analysis revealed that elevated levels of LDH (OR=1.18, 95% CI: 1.00–1.43, p=0.051), BNP (OR=3.85, 95% CI: 1.54–11.4, p=0.003), and IL-8 (OR=8.97, 95% CI: 1.86–55.2, p=0.005), along with prolonged APTT (OR=1.54, 95% CI: 0.81–3.15, p=0.02), were significantly associated with encephalopathy, indicating their potential as independent risk factors. Notably, higher CD4⁺ T-cell counts (OR=0.55, 95% CI: 0.38–0.87, p=0.012) were protective against encephalopathic complications (Table 1).

Table 1.

Multivariate logistic regression analysis of risk factors for encephalopathy in fever with thrombocytopenia syndrome.

Characteristic	OR	95% CI	p-value
LDH (per 100 U/L)	1.18	1.00, 1.43	0.051
BNP	3.85	1.54, 11.4	0.003
VL	1.47	0.96, 2.31	0.078
CD4 (per 100 cells/μL)	0.55	0.38, 0.87	0.012
APTT (per 10s)	1.54	0.81, 3.15	0.192
AST (per 100 U/L)	1.25	0.82, 1.97	0.310
IL.8	8.97	1.86, 55.2	0.005

Notes. OR = odds ratio; CI = confidence interval; BNP and VL values were log-transformed. Bold font indicates statistically significant associations (P < 0.05). LDH = lactate dehydrogenase; BNP = B-type natriuretic peptide; VL = viral load; APTT = activated partial thromboplastin time; AST = aspartate aminotransferase; IL-8 = interleukin-8. Bold text indicates p-values less than 0.05.

To explore the probability of viral encephalopathy in patients with severe fever and thrombocytopenia syndrome, we developed a dynamic nomogram incorporating key biomarkers. For Patient A, with IL-8 at 27.33 pg/mL, AST at 564.9 U/L, APTT at 38.6 s, CD4⁺ T cells at 77 cells/μL, a viral load (VL) of 0.5 log10 copies/mL, BNP at 40.7 pg/mL, and LDH at 813 U/L, the model predicted a 34.9% probability of viral encephalopathy (Figure 2(a)). In contrast, Patient B presented a markedly greater risk at 80.8%, with parameters including IL-8 at 196.01 pg/mL, AST at 252.9 U/L, APTT at 39.4 s, CD4⁺ T cells at 86 cells/μL, VL at 1.0 log10 copies/mL, BNP at 199.4 pg/mL, and LDH at 1699 U/L (Figure 2(b)). The nomogram functions by summing the points from each variable to yield a total score, which is then converted into a probability estimate. This visualization highlights the critical role of IL-8, AST, APTT, CD4⁺ T-cell count, VL, BNP and LDH in modulating the risk of viral encephalopathy, offering a precise tool for clinical risk stratification.

Figure 2.

Dynamic nomogram for predicting viral encephalopathy probability in patients with severe fever with thrombocytopenia syndrome. Figures A and B show the predictions made by the predictive model for two different patients at different stages of the disease. Figure A shows a patient with an IL-8 level of 27.33 pg/mL, an AST level of 564.9 U/L, APTT of 38.6 s, CD4+ count of 77 cells/μL, VL of 0.5 log10 copies/mL, BNP of 40.7 pg/mL, and LDH of 813 U/L, with a 35% risk of in-hospital mortality. Figure B represents a patient with an IL-8 level of 27.33 pg/mL, an AST level of 252.9 U/L, APTT of 39.4 s, CD4+ count of 86 cells/μL, viral load of 1.0 log10 copies/mL, BNP of 199.4 pg/mL, and LDH of 813 U/L.

3.4. Dynamic nomogram model evaluation and validation

To evaluate the nomogram’s predictive accuracy for viral encephalopathy risk in patients with severe fever and thrombocytopenia syndrome, we compared its AUC against that of the APACHE II scoring system across training and validation datasets. In the training set, the nomogram demonstrated a superior AUC of 0.958 (95% CI: 0.930–0.987) compared with the APACHE II score of 0.839 (95% CI: 0.839–0.951, P = 0.04972) (Figure 3(a)). The validation performance further improved, with the nomogram achieving an AUC of 0.974 (95% CI: 0.924–1.000) versus the APACHE II score, which was 0.926 (95% CI: 0.860–0.991, P = 0.18) (Figure 3(b)).

Figure 3.

Comparative ROC analysis of the nomogram and APACHE II score for predicting viral encephalopathy risk in SFTS patients. ((a). Training set; (b). validation set).

To address potential optimism bias from the feature selection and modeling process, we performed an internal validation using a bootstrap correction procedure. All steps—from univariate screening to the final multivariate logistic regression—were nested within each of 200 bootstrap resamples of the training set (n=134). The optimism-corrected performance estimates are presented in Table 2. The corrected AUC of 0.904 indicates that the model maintains strong discriminatory power for risk stratification. However, the corrected calibration slope (-0.044) and intercept (-0.226) indicate significant calibration bias, suggesting the model’s predicted probabilities require adjustment for absolute risk estimation.

Table 2.

Bootstrap optimism-corrected performance of the encephalopathy prediction nomogram.

Metric	Apparent	Optimism	Optimism-corrected
AUC	0.951	0.046	0.904
Brier score	0.087	–0.048	0.135
Calibration intercept	0.001	0.227	–0.226
Calibration slope	1.001	1.045	–0.044

These results establish the nomogram as a more precise tool for predicting the risk of viral encephalopathy than the conventional APACHE II scoring system does, particularly in the validation cohort. The calibration curves for both the training and validation sets revealed good agreement between the actual and predicted probabilities (Figure 4). The values of the Hosmer and Lemeshow goodness-of-fit (GOF) tests were 0.48 and 0.123 in the training and validation sets, respectively. The DCA results for the nomogram were depicted and compared with those of the nomogram score and APACHE II score. In the validation cohort, medical intervention guided by the nomogram yielded greater net clinical benefit than both the nomogram and the APACHE II score did at any threshold probability (Figure 5).

Figure 4.

Calibration plots for assessing model calibration performance in panel A (training set) and panel B (validation set).

Figure 5.

Decision curve analysis demonstrating the net benefit of distinct strategies as the high-risk threshold varies.

3.5. Based on the correlation between key variables and PCA principal component analysis

To elucidate the interconnectedness of biomarkers, we performed Pearson correlation analysis and generated a network visualization (Supplemental Figure 2). The analysis revealed that AST displayed the most robust correlation with LDH (r=0.76, P<1e-20), with secondary strong associations observed between APTT and LDH (r=0.63) as well as between AST and IL-8 (r=0.57; all P<0.001) (Table 3). Notably, VL, CD4⁺ T cells, and urea maintained relative network independence (mean r<0.3). Both AST and LDH emerged as central hubs, exhibiting significant correlations with 5 and 4 other biomarkers, respectively (all P<0.001). The highest betweenness centrality score for LDH (0.28) underscores its critical function as a principal mediator within the biomarker interaction network (Table 4).

Table 3.

Correlation coefficients between variables and significance test results.

Variable 1	Variable 2	correlation coefficient (r)	p-value
AST	LDH	0.7609810	<1e-20
APTT	LDH	0.6273489	0
BNP	LDH	0.6129405	0
AST	IL.8	0.5741529	0
APTT	AST	0.5568022	0
AST	CK	0.5381401	0
AST	BNP	0.5249214	0
LDH	Risk	0.5061875	0

Table 4.

Mediation effect of lactate dehydrogenase (LDH).

Variable	ACME_estimate	ACME_CI	ACME_p	ADE_estimate	ADE_CI	ADE_p	Total_effect	Total_CI	Prop_mediated
APTT	0.216964	[0.079 - 0.389]	0.006	0.2555913	[-0.023 - 0.655]	0.084	0.4725553	[0.266 - 0.761]	0.4591293
BNP	0.216964	[0.085 - 0.402]	0.006	0.2555913	[-0.046 - 0.631]	0.090	0.4725553	[0.26 - 0.762]	0.4591293
AST	0.216964	[0.073 - 0.388]	0.004	0.2555913	[-0.02 - 0.644]	0.080	0.4725553	[0.249 - 0.776]	0.4591293
CK	0.216964	[0.078 - 0.408]	0.010	0.2555913	[-0.069 - 0.657]	0.100	0.4725553	[0.245 - 0.771]	0.4591293
IL.8	0.216964	[0.089 - 0.394]	0.004	0.2555913	[-0.012 - 0.624]	0.066	0.4725553	[0.246 - 0.746]	0.4591293

To investigate the structural relationships among key biomarkers associated with encephalopathy in patients with severe fever and thrombocytopenia syndrome, we conducted principal component analysis (PCA), excluding the relatively independent biomarkers VL, CD4⁺ T, and urea. The results revealed that the first three principal components accounted for 78.8% of the total variance. PC1 explained 53.3% of the variance and primarily reflected tissue damage-related indicators, with LDH and AST contributing loadings of 0.46 and 0.45, respectively (Supplemental Figure 3C). PC2 accounted for 13.9% of the variance and represented inflammation- and coagulation function-related dimensions, with IL-8 and APTT contributing loadings of 0.22 and 0.41, respectively. PC3 explained 11.6% of the variance and was associated with coagulation and immune regulation, with CK contributing a loading of 0.47. On the basis that the cumulative variance explained exceeded 70%, combined with the magnitude of the eigenvalues, we retained the first three principal components (PC1--PC3). These findings elucidate potential structural relationships among different biomarkers in the development of encephalopathy, indicating that tissue damage, inflammation, and coagulation may represent key mechanisms in encephalopathy pathogenesis.

To visualize the distribution of key biomarkers associated with encephalopathy in patients with severe fever and thrombocytopenia syndrome, we generated PCA biplots (Supplemental Figure 4). The results revealed that nonencephalopathy samples (red dots) presented tighter clustering than did encephalopathy samples (green triangles) across PC1, PC2, and PC3. In the PC1 vs. PC2 biplot, AST and LDH showed strong positive correlations along PC1, whereas risk, APTT, IL-8, and BNP demonstrated higher loadings on PC2. The PC2 vs. PC3 biplot underscored the strong associations among risk, APTT, and IL-8, with CK showing significant positive effects along PC3. In the PC1 vs. PC3 biplot, Risk also displayed minimal vector angles with APTT and IL-8. Compared with other factors, APTT, IL-8, and CK presented the smallest vector angles relative to Risk, suggesting that they may be pivotal in encephalopathy pathogenesis.

To evaluate the association between encephalopathy occurrence and population substructure, we constructed a logistic regression model in which encephalopathy occurrence was used as the dependent variable and the first three principal components (PC1--PC3) were used as covariates. The results demonstrated that PC1 was strongly associated with encephalopathy risk (OR=93.1, 95% CI: 21.1–642; P<0.05), indicating its robust predictive capacity. PC2 also showed a significant association (OR=6.34, 95% CI: 2.05--25.7, P<0.05), highlighting its contributory role in risk prediction. Conversely, PC3 was not significantly associated with encephalopathy risk (OR=0.726, 95% CI: 0.120–3.53; P=0.705). These results suggest that PC1 and PC2 are key drivers of encephalopathy development, whereas PC3 has minimal influence (Table 5). We analyzed variable distributions in PC1 and PC2 and evaluated the threshold effect of IL-8 via restricted cubic spline analysis. Significant threshold effects of IL-8 emerged at BNP=208, CK=376.5, and APTT=40.35 (Figure 6(a) and (b)). Specifically, IL-8 values below 86.8 predicted significantly negative outcomes, whereas values above 94.7 predicted significantly positive outcomes; values between these thresholds showed no significant prediction (Figure 6(c), Supplemental Table 3). The point estimate threshold was 90.1, supported by a nonlinear P=0.002 and overall P<0.001. These results highlight the threshold-dependent influence of IL-8 on encephalopathy risk prediction.

Table 5.

Logistic regression analysis of encephalopathy risk associated with population substructure components (PC1-PC3).

Term	OR	CI_low	CI_high	P_value
PC1	93.1	21.1	642	1.37e-7
PC2	6.34	2.05	25.7	0.00341
PC3	0.726	0.120	3.53	0.705

Note. Values represent odds ratios (OR) with 95% confidence intervals (CI) and corresponding p-values. PC1 showed exceptionally strong association (OR=93.1, P=1.37×10^-7), while PC2 demonstrated moderate association (OR=6.34, P=0.003). PC3 was not significantly associated with encephalopathy risk (P=0.705).

Figure 6.

Threshold effect analysis of the effect of IL-8 on encephalopathy risk.

4. Discussion

A machine learning - driven nomogram incorporating seven common biomarkers (LDH, BNP, IL - 8, prolonged APTT, CD4⁺T - cell count, AST, and viral load) can accurately predict viral encephalopathy in patients with severe fever with thrombocytopenia syndrome (SFTS)^13,14. The model outperformed the traditional APACHE II score. Furthermore, PCA uncovered three dominant patterns of biomarker covariation—dominated by markers of tissue injury (PC1), inflammation/coagulation (PC2), and immune-related processes (PC3)—that together explained 78.8% of the variance. These patterns suggest hypotheses about interconnected biological pathways that may contribute to encephalopathy risk.

4.1. From PCA axes to an integrated pathological model: Theoretical links with established biomarkers

Our PCA revealed three core pathological axes: tissue damage (PC1), inflammation-coagulation (PC2), and immune dysregulation (PC3). Although this study did not measure ferritin—a key biomarker repeatedly validated in the prediction of SFTS encephalitis and mortality^15,16—our findings provide new insights into its central role. We propose a hypothetical integrative model: SFTSV infection may initially trigger widespread immune dysregulation (PC3), characterized by CD4⁺ T-cell depletion, leading to failure in viral control and disruption of immune homeostasis. This, in turn, drives an intense inflammatory response (PC2), in which factors such as IL-8 mediate neutrophil infiltration and endothelial damage, mutually exacerbating coagulation activation (prolonged APTT). Inflammation and direct viral damage jointly lead to multi-organ tissue necrosis (PC1, elevated LDH/AST). The consistently reported hyperferritinemia in the literature is likely a key manifestation and amplifier of this cascade; it serves as a marker of macrophage activation and cytokine storm,¹⁷ and may also exacerbate the pathological processes of PC2 and PC1 through its pro-inflammatory properties. Therefore, the axes identified in this model are not isolated; they depict an upstream immune-inflammatory storm that may be common to SFTS encephalitis, with ferritin potentially serving as a serological hub connecting these axes. Future studies validating these findings in cohorts that simultaneously include ferritin and the biomarkers of this model would be of great value.

4.2. Key discoveries

4.2.1. Tissue injury

LDH is identified as the strongest independent predictor (OR = 1.18, P < 0.051), indicating its significance in predicting SFTS encephalitis. Elevated LDH reflects widespread cellular damage, consistent with its role in cytokine storms and hemophagocytic lymphohistiocytosis (HLH) in SFTS.¹⁸ PCA shows that PC1 (explaining 53.3% of variance), dominated by LDH and AST (r = 0.76), represents a tissue - damage axis strongly linked to encephalitis (OR = 93.1). The centrality of LDH in the biomarker network (betweenness centrality = 0.28) highlights its role in integrating hepatic, muscular, and neurological injury. Additionally, the strong correlation between AST and CK levels (r = 0.54) further validates multiorgan injury as a characteristic of SFTS encephalitis.¹⁹

4.2.2. Cardiocerebral interaction

BNP has a strong association with encephalitis (OR =3.85), uncovering a previously under - explored cardiocerebral interaction in SFTS. This finding builds on reports of cardiac involvement in severe SFTS and suggests that BNP - driven microcirculatory dysfunction may contribute to neural injury.²⁰ The PCA biplot shows the alignment of BNP with the PC2 inflammation - coagulation axis, supporting the idea that myocardial strain can amplify neuroinflammation through shared pathways.

4.2.3. Immune dysfunction

The study found that CD4⁺ T cells (OR = 0.55) act as a protective factor, which is consistent with previous findings that lymphocytopenia exacerbates the severity of SFTS,²¹ providing direct clinical evidence for a key pathogenic mechanism of SFTSV. This finding is not merely a statistical association but aligns precisely with in vitro and pathological evidence. Li et al.²² demonstrated that SFTSV induces T-cell apoptosis through the Fas/FasL pathway. Our observation that patients who develop encephalitis have significantly lower CD4⁺ T counts suggests that this virus-induced lymphopenia is clinically consequential. The depletion of these crucial immune cells likely creates a permissive environment for unchecked viral replication and diminishes the regulation of inflammatory responses, thereby facilitating the progression to severe neurological complications. This finding is consistent with the findings of Wang et al., namely, that the pathogenic mechanism of STFSV is the result of the combined effects of viral immune evasion and an excessive host inflammatory response.²³The independence of CD4⁺ T cell count from viral load in our correlation analysis further implies that this immune dysregulation axis (PC3) contributes to encephalitis risk through mechanisms beyond direct viral cytopathy, possibly involving loss of immune homeostasis and failure to control the inflammatory cascade represented by PC2.

4.2.4. Synthesized pathophysiological model for SFTS-associated encephalopathy

Collectively, our biomarker analysis and PCA, when interpreted in the context of prior research, support a synergistic model for SFTS encephalitis. The process may be initiated by high viral load, leading to direct tissue injury (PC1: ↑LDH/AST) and the triggering of a pro-inflammatory state (PC2: ↑IL-8, ↑APTT). SFTSV simultaneously drives immune dysregulation (PC3: ↓CD4⁺ T cells) via apoptosis, impairing host defense and inflammatory control. These axes are not linear but interactive: tissue damage releases DAMPs that fuel inflammation, while the inflammatory and coagulopathic milieu (PC2) exacerbates endothelial and organ injury. Key biomarkers like ferritin rise as a consequence of this integrated storm. This framework moves beyond listing independent risk factors to proposing a networked pathophysiology, where the biomarkers identified in our nomogram are not just predictors but sentinels of interconnected biological processes. This mechanistic understanding underscores potential therapeutic targets, such as modulating the IL-8 pathway or exploring strategies to mitigate T-cell apoptosis.

4.2.5. IL - 8 threshold

IL - 8 shows a nonlinear threshold effect (cutoff 90.1 pg/mL), similar to what has been reported in COVID - 19 encephalitis.²⁴ When serum IL-8 exceeds the identified threshold of 90.1 pg/mL, we recommend immediate escalation of anti-inflammatory and immunomodulatory strategies to interrupt the neutrophil-driven cascade toward encephalitis. Clinically, this may include early initiation of moderate-to-high-dose corticosteroids (e.g., methylprednisolone 1–2 mg/kg/day for 3–5 days, tapered according to response), combined with close neurological monitoring (daily Glasgow Coma Scale and EEG if available) and consideration of ICU transfer if additional organ dysfunction emerges. In centers with access to experimental IL-8 pathway inhibitors or biologics, these could be evaluated under compassionate use. This threshold-guided approach mirrors successful biomarker-directed protocols in COVID-19 encephalitis and ARDS, where exceeding similar cytokine cutoffs prompted timely immunosuppression and improved neurological outcomes. By providing an actionable, serum-based trigger even in thrombocytopenic patients where lumbar puncture is contraindicated, the model offers practical decision support at the bedside and may reduce progression to severe encephalopathy.

4.2.6. Superiority over APACHE II

The nomogram demonstrates better discrimination (AUC 0.974 vs 0.926) and calibration (Hosmer–Lemeshow P = 0.123) compared to the APACHE II score. This provides evidence that disease - specific biomarker panels are more effective than generic severity scores, similar to findings in recent sepsis studies.²⁵

4.2.7. Comparison with existing prediction models for SFTS encephalitis

Several prediction models for SFTS-associated encephalopathy have recently been developed. Wu et al. (2025) developed prediction models for encephalitis and in-hospital death in SFTS patients, identifying key risk factors including age and ferritin, which align with and complement the biomarker selection in our study (e.g., LDH, IL-8, and CD4⁺ T-cell count). Wu et al.¹⁵ proposed a nomogram based on age, tremor, dysphoria, hypertension, and ferritin in a single-center cohort, achieving an AUC of 0.895. Zheng et al.⁵ developed a reservoir computing with boosted topology (RC-BT) model using nine clinical and laboratory parameters (e.g., calcium, muscle soreness, troponin T), with an AUC of 0.899. While these models provide valuable tools for risk stratification, they rely primarily on demographic, clinical symptoms, and routine laboratory variables.

In comparison, our machine learning-driven nomogram incorporates seven readily available biomarkers (LDH, BNP, IL-8, prolonged APTT, CD4⁺ T-cell count, AST, and viral load) and employs advanced feature selection (Boruta + SHAP-RFE) plus PCA-derived mechanistic insights, achieving superior discrimination (AUC 0.974 vs. 0.895 in Wu et al.). Unlike Wu et al.’s model, which emphasizes demographic and ferritin-based risk stratification suitable for rapid bedside screening in resource-limited settings, our approach captures dynamic immune-inflammatory and tissue-injury axes (PC1–PC3), offering greater mechanistic interpretability and actionable thresholds (e.g., IL-8 > 90.1 pg/mL for immunomodulation). This makes our nomogram particularly applicable for precision risk stratification in tertiary centers where biomarker panels and dynamic monitoring are feasible, while remaining complementary to ferritin-inclusive models for broader validation.

Our model extends this work in several important directions. First, by incorporating CD4⁺ T cell counts, we directly capture the virus-induced immune dysregulation that is central to SFTS pathogenesis.²⁶ Second, the inclusion of IL-8—a key neutrophil chemokine—reflects the inflammatory cascade underlying encephalitis development. Third, our use of SHAP analysis enabled identification of nonlinear threshold effects (e.g., IL-8 > 90.1 pg/mL) that can guide clinical decision-making, such as when to consider intensified anti-inflammatory therapy.

In terms of predictive performance, our model’s AUC of 0.974 in the validation set is highly competitive. More importantly, we believe the mechanistic interpretability (via immune-inflammatory markers) and granular clinical guidance (via SHAP-derived thresholds) offered by our approach represent meaningful advances beyond predictive accuracy alone. Future multicenter studies are warranted to validate our findings and to assess whether the inclusion of these immune-inflammatory markers improves risk stratification and clinical outcomes in broader populations.

4.3. Study limitations

The study has some limitations. First, its single - center retrospective design limits generalizability, especially considering regional variations in SFTSV strains.^15,27 The lack of serial VL measurements prevents the analysis of the temporal impact of viral kinetics on encephalitis risk. Moreover, the absence of cerebrospinal fluid biomarkers restricts the understanding of intrathecal inflammation.²⁴ Second, While our institutional protocol mandates immediate blood sampling upon admission, we did not record the exact time of blood draw for each patient. Therefore, we cannot completely rule out the possibility that minor variations in sampling timing within the first 24 hours may have introduced some degree of bias, particularly for rapidly changing biomarkers such as IL-8 and BNP. Future prospective studies with time-stamped sample collection are warranted to confirm our findings. Third, Due to the limitations of the retrospective design, some known important prognostic biomarkers (such as ferritin) were not systematically measured in this cohort.

In addition, this study has the following limitations: First, the impact of therapeutic interventions on the risk of encephalitis was not fully evaluated. Previous studies have shown that the combined use of IVIG (>80 g) and glucocorticoids can reduce the mortality rate of neurological complications associated with SFTS.^15,28 Although this study employed a retrospective design, it did not address treatment confounding through propensity score matching or time-dependent analysis; validation in prospective cohorts is warranted. Second, the study did not examine the moderating effects of underlying conditions (e.g., hypertension). Hypertension has been established as an independent risk factor for SFTS encephalitis¹⁵ and may influence the predictive performance of the model through vascular endothelial damage or amplified inflammation; although this study included some comorbidity data, no interaction analyses were conducted, and future studies could employ stratified analysis or machine learning models to further explore this. These limitations may restrict the generalizability of the predictive model but do not affect the internal validity of the current core findings.

4.4. Future research directions

Future research should focus on multicenter validation to enhance the model’s robustness. Exploring dynamic biomarker monitoring, such as through mHealth platforms, for real - time risk stratification is also crucial. Integrating multiomics data could reveal genotype - phenotype interactions related to encephalitis susceptibility. Interventional studies targeting IL - 8 or CD4⁺T - cell preservation may help validate therapeutic strategies. Although our model incorporates new immune and inflammatory biomarkers (CD4⁺ T cells, IL-8), direct comparisons of its performance with models based solely on conventional biomarkers (such as ferritin) are limited. Future studies should validate and compare this model in prospective, multicenter cohorts that include a more comprehensive set of biomarkers. Additionally, extending the PCA - guided mechanistic frameworks to other viral encephalitides may uncover universal pathophysiological axes and improve global management paradigms.

5. Conclusions

In conclusion, our study identifies three core pathophysiological axes in SFTS patients via PCA analysis—tissue injury (LDH/AST), inflammation-coagulation interaction (IL-8/APTT), and immune dysregulation—providing novel insights into disease progression and prognosis. By cross-discussing these axes with existing literature, including the roles of ferritin, LDH, IL-8, and the protective effects of CD4+ T cells in relation to SFTSV-induced apoptosis, we substantially deepen the mechanistic understanding. Moreover, the nonlinear threshold effect of IL-8 at a cutoff of 90.1 pg/mL offers substantial clinical translational potential for early risk stratification. These findings advocate for early biomarker monitoring to improve patient outcomes…particularly when IL-8 surpasses 90.1 pg/mL, enabling timely anti-inflammatory intervention. Furthermore, drawing on the rigorous validation framework and emphasis on clinical practicality in Tan et al. (OHCCPredictor: an online risk stratification model for predicting survival duration of older patients with hepatocellular carcinoma, Hepatol Int, 2024),²⁹ we have developed a corresponding offline calculator or simplified scoring tool for the nomogram proposed in this study (see Appendix). This would optimize its accessibility, facilitate rapid risk assessment within routine clinical workflows, and enhance its translational value, particularly in resource-limited settings.

Supplemental material

Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients

Supplemental material for Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients by Daguang Cui, Chenhu Ma, Dan Wang, Lingyan Xiao, Dongyang Shi, Hui Dong, Kai Yang, Yishan Zheng in DIGITAL HEALTH

Supplemental material

Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients

Supplemental material

Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients

Supplemental material

Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients

Footnotes

ORCID iDs

Daguang Cui

Chenhu Ma

Dan Wang

Lingyan Xiao

Hui Dong

Kai Yang

Yishan Zheng

Ethical considerations

The study was approved by the Second Hospital of Nanjing ethics committees (2023-L-S-023). All procedures involving human participants were conducted in accordance with the ethical standards of the Declaration of Helsinki.

Consent to participate

Written informed consent was obtained from all individual participants included in the study.

Consent for publication

All authors confirm that written informed consent was obtained from the patients/participants (or their legal guardians if applicable) for the publication of their individual clinical data, images, or any other potentially identifiable information included in this manuscript. The data presented have been anonymized to protect patient privacy and comply with ethical standards. No identifiable details that could disclose the identity of participants are included in this publication.

Authors’ contributions

DGC, CHM and DW contributed equally as the first authors. YSZ, KY and HD contributed equally as the corresponding authors. DGC and KY contributed to writing the paper. HD, CHM and DW developed the methodology and contributed to revising the paper. LYX and DYS conducted the statistical analysis. YSZ provided financial support. The final manuscript received approval from all the authors.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Grants from Nanjing Second Hospital Reserve Talents (0316301), the Talent Lift Project of Nanjing Second Hospital (RCMS23010), the Natural Science Foundation Project of Nanjing University of Chinese Medicine (XZR2024043), the 2023 Nanjing Second Hospital Talent Support Project Grant (RCZD23003), Nanjing Infectious Disease Clinical Medical Center, Innovation Center for Infectious Diseases of Jiangsu Province (CXZX202232) were used.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Owing to patient privacy rights, we will not disclose the research data. If you have any questions or requests, please contact the corresponding author.*

Supplemental material

Supplemental material for this article is available online.

References

Liang

Zhang

, et al. Fever with thrombocytopenia associated with a novel bunyavirus in China. N Engl J Med 2011; 364(16): 1523–1532. https://doi.org/10.1056/NEJMoa1010095

Zhang

Tan

Jin

, et al. Severe fever with thrombocytopenia syndrome virus trends and hotspots in clinical research: A bibliometric analysis of global research. Front Public Health 2023; 11: 1120462. https://doi.org/10.3389/fpubh.2023.1120462

Zhao

, et al. Epidemiology, clinical characteristics, and treatment of severe fever with thrombocytopenia syndrome. Infect Med (Beijing) 2022; 1(1): 40–49. https://doi.org/10.1016/j.imj.2021.10.001

Yang

Wang

Huang

, et al. Establishment and validation of a prognostic nomogram for severe fever with thrombocytopenia syndrome: A retrospective observational study. PLoS One 2024; 19(10): e0311924. https://doi.org/10.1371/journal.pone.0311924

Zheng

Geng

, et al. A Reservoir Computing with Boosted Topology Model to Predict Encephalitis and Mortality for Patients with Severe Fever with Thrombocytopenia Syndrome: A Retrospective Multicenter Study. Infect Dis Ther 2023; 12(5): 1379–1391. https://doi.org/10.1007/s40121-023-00808-y

Wang

Huang

Liu

, et al. Risk factors of severe fever with thrombocytopenia syndrome combined with central neurological complications: A five-year retrospective case-control study. Front Microbiol 2022; 13: 1033946. https://doi.org/10.3389/fmicb.2022.1033946

Alfred

Obit

. The roles of machine learning methods in limiting the spread of deadly diseases: A systematic review. Heliyon 2021; 7(6): e07371. https://doi.org/10.1016/j.heliyon.2021.e07371

Zhang

, et al. Lactate Dehydrogenase/Albumin to Urea Ratio: A Novel Prognostic Indicator for Adverse Outcomes in Patients With Severe Fever With Thrombocytopenia Syndrome. J Med Virol 2025; 97(6): e70428. https://doi.org/10.1002/jmv.70428

Cho

Lee

. Estimating severe fever with thrombocytopenia syndrome transmission using machine learning methods in South Korea. Sci Rep 2021; 11(1): 21831. https://doi.org/10.1038/s41598-021-01361-9

10.

Yang

Quan

, et al. A multicenter study on developing a prognostic model for severe fever with thrombocytopenia syndrome using machine learning. Front Microbiol 2025; 16: 1557922. https://doi.org/10.3389/fmicb.2025.1557922

11.

Ben Salem

Ben Abdelaziz

. Principal Component Analysis (PCA). Tunis Med 2021; 99(4): 383–389.

12.

Collins

Reitsma

Altman

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Bmj 2015; 350(jan07 4): g7594-g. https://doi.org/10.1136/bmj.g7594

13.

Zheng

Zhang

. Clinical features of severe fever with thrombocytopenia syndrome and analysis of risk factors for mortality. BMC Infect Dis 2021; 21(1): 1253. https://doi.org/10.1186/s12879-021-06946-3

14.

Zhu

Zhou

Tao

, et al. Identification of early prognostic biomarkers in Severe Fever with Thrombocytopenia Syndrome using machine learning algorithms. Ann Med 2025; 57(1): 2451184. https://doi.org/10.1080/07853890.2025.2451184

15.

Tan

Zhao

, et al. Prediction models for encephalitis and in-hospital death associated with severe fever with thrombocytopenia syndrome. BMC Infectious Diseases 2025; 25(1): 1558. https://doi.org/10.1186/s12879-025-12018-7

16.

Xie

Dang

, et al. Prognostic Value of Serum Ferritin for Patients with Severe Fever with Thrombocytopenia Syndrome: A Single-Center Retrospective Cohort Study. Infect Dis Ther 2023; 12(3): 979–988. https://doi.org/10.1007/s40121-023-00784-3

17.

Kaneko

Shikata

Matsukage

, et al. A patient with severe fever with thrombocytopenia syndrome and hemophagocytic lymphohistiocytosis-associated involvement of the central nervous system. J Infect Chemother 2018; 24(4): 292–297. https://doi.org/10.1016/j.jiac.2017.10.016

18.

Zhang

Gong

Zeng

, et al. Unveiling fatal risk factors: Predicting hemophagocytic lymphohistiocytosis in SFTS patients. PLoS Negl Trop Dis 2025; 19(6): e0013207. https://doi.org/10.1371/journal.pntd.0013207

19.

Bogovič

Lotrič-Furlan

Ogrinc

, et al. Elevated levels of serum muscle enzymes in the initial phase of tick-borne encephalitis. Infect Dis (Lond) 2024; 56(6): 504–509. https://doi.org/10.1080/23744235.2024.2335349

20.

Chen

, et al. Clinical characteristics and influencing factors of severe fever with thrombocytopenia syndrome complicated by viral myocarditis: a retrospective study. BMC Infect Dis 2024; 24(1): 240. https://doi.org/10.1186/s12879-024-09096-4

21.

Song

Zou

Wang

, et al. Cytokines and lymphocyte subsets are associated with disease severity of severe fever with thrombocytopenia syndrome. Virol J 2024; 21(1): 126. https://doi.org/10.1186/s12985-024-02403-0

22.

Xiong

, et al. Depletion but Activation of CD56(dim)CD16(+) NK Cells in Acute Infection with Severe Fever with Thrombocytopenia Syndrome Virus. Virol Sin 2020; 35(5): 588–598. https://doi.org/10.1007/s12250-020-00224-3

23.

Wang

Tan

, et al. The Endless Wars: Severe Fever With Thrombocytopenia Syndrome Virus, Host Immune and Genetic Factors. Front Cell Infect Microbiol 2022; 12: 808098. https://doi.org/10.3389/fcimb.2022.808098

24.

Guasp

Muñoz-Sánchez

Martínez-Hernández

, et al. CSF Biomarkers in COVID-19 Associated Encephalopathy and Encephalitis Predict Long-Term Outcome. Front Immunol 2022; 13: 866153. https://doi.org/10.3389/fimmu.2022.866153

25.

Guo

Liao

Liu

. Prognostic value of TNF-α, PCT, IL-8, and HBP, combined with APACHE II score in patients with sepsis. J Infect Dev Ctries 2025; 19(3): 439–445. https://doi.org/10.3855/jidc.20383

26.

Zhang

Weng

, et al. CD4 T cell loss and Th2 and Th17 bias are associated with the severity of severe fever with thrombocytopenia syndrome (SFTS). Clin Immunol 2018; 195: 8–17. https://doi.org/10.1016/j.clim.2018.07.009

27.

Takahashi

Maeda

Suzuki

, et al. The first identification and retrospective study of Severe Fever with Thrombocytopenia Syndrome in Japan. J Infect Dis 2014; 209(6): 816–827. https://doi.org/10.1093/infdis/jit603

28.

Liu

Yun

Tong

Hanwen

Fei

, et al. Effect of intravenous immunoglobulin therapy on the prognosis of patients with severe fever with thrombocytopenia syndrome and neurological complications. Frontiers in immunology 2023; 14: 1118039. https://doi.org/10.3389/fimmu.2023.1118039

29.

Tan

Lin

, et al. OHCCPredictor: an online risk stratification model for predicting survival duration of older patients with hepatocellular carcinoma. Hepatol Int 2024; 18(2): 550–567. https://doi.org/10.1007/s12072-023-10516-x

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

1.13 MB

0.00 MB

0.58 MB

0.48 MB