Abstract
Objective
To develop a machine learning-driven predictive model for early identification of viral encephalopathy risk in SFTS patients.
Materials and methods
This retrospective study included 192 SFTS patients (58 with viral encephalopathy) from Nanjing Second Hospital (June 2022–December 2024). Boruta and SHAP-RFE-CV identified nine predictors, refined via LASSO regression (λ.1se=0.018). Multivariate logistic regression analyzed risk factors, constructing a dynamic nomogram. Model performance was validated against APACHE II using AUC, calibration curves, and DCA. PCA explored biomarker interactions.
Results
Key predictors included viral load, LDH, BNP, IL-8, APTT, and CD4+ T cells. Independent risk factors were
Discussion
The nomogram integrates dynamic biomarkers to address multicollinearity, outperforming traditional scores. PC1-PC2 axes explained 67.2% of encephalitis variance, highlighting tissue damage and inflammatory-coagulation dysregulation as central mechanisms.
Conclusion
This study establishes a clinically actionable nomogram for SFTS-associated encephalopathy risk stratification. PCA insights reveal mechanistic interactions, offering novel therapeutic targets.
Keywords
1. Introduction
Severe fever with thrombocytopenia syndrome (SFTS), caused by SFTSV, is a life-threatening tick-borne disease with frequent neurological complications like encephalitis. 1 Recent bibliometric analyses have systematically mapped the global trends and hotspots in SFTSV clinical research, identifying antiviral therapy, immunotherapy, and virus transmission mechanisms as the primary focuses of current investigations. 2 Early prediction of SFTS-associated encephalopathy remains challenging due to the lack of specific prognostic tools. 3 Current models, such as APACHE II, show limited specificity and fail to capture dynamic biomarker interactions critical to disease progression.4,5
Existing prognostic models often rely on univariate or traditional multivariate analyses, which may overlook complex feature dependencies. 6 For instance, lactate dehydrogenase (LDH) and viral load (VL) are established prognostic markers, but their synergistic effects with coagulation (e.g., APTT) and inflammatory cytokines (e.g., IL-8) remain unclear. Additionally, most studies neglect advanced feature selection algorithms like Boruta or SHAP-based recursive elimination, which mitigate overfitting and improve model precision.7–10
To address these gaps, we developed a hybrid machine learning strategy combining Boruta and SHAP-based recursive feature elimination (RFE-CV) to identify key predictors (e.g., VL, LDH, CD4+ T cells). Principal component analysis (PCA) was further employed to decode latent biomarker interactions, revealing three principal
Compared to the APACHE II score, our dynamic nomogram exhibited superior discrimination (AUC: 0.974 vs. 0.926) and clinical utility. PCA identified PC1 (OR=93.1) and PC2 (OR=6.34) as significant encephalitis risk factors. This study provides a translational tool for precision risk stratification and mechanistic insights into SFTS-associated encephalopathy.
2. Materials and methods
2.1. Participants
This single-center retrospective cohort study included 192 SFTS patients diagnosed via qRT-PCR at Nanjing Second Hospital (June 2022–December 2024). Viral encephalopathy was defined as SFTS combined with neurological symptoms
1
: muscle tremors (tongue/jaw/limbs)
2
; cognitive dysfunction (orientation/memory deficits, slowed reactions, aphasia)
3
; altered consciousness (drowsiness/coma); or
4
seizures. Patients were randomly split into training (70%) and validation (30%) sets. They were further divided into a nonencephalitic group and an encephalitic group on the basis of the presence or absence of viral encephalopathy. All patients met the following criteria: laboratory-confirmed SFTSV infection (positive for serum viral RNA), age ≥18 years, and complete clinical data. The exclusion criteria were other causes of central nervous system diseases (e.g., stroke, brain tumor, epilepsy, etc.); concurrent active viral infections (e.g., HIV, HBV, HCV, etc.); and missing clinical data >20%.
2.2. Data collection
Clinical and laboratory data collected within 24 hours of admission included: Demographics: age, sex, and underlying conditions (hypertension, diabetes, cancer, etc.). Symptoms: fever spikes, neurological deficits (tremors, cognitive impairment, seizures), and consciousness changes, etc. Lab markers:
Inflammatory: CRP, PCT, IL-6/8/10/IFN-γ.
Liver: ALT, AST, TBil, ALB.
Cardiac: BNP, LDH, CK/CK-MB/cTnI.
Coagulation: PT, APTT, FIB, FDP, D-dimer.
Immune: CD4+/CD8+ T cells, NK cells.
Virological: SFTSV viral load (VL).
Severity scores: APACHE II and RISK mortality risk assessment.
2.3. Variable selection and model creation
To build robust predictive models and analyze the relationships between variables, we used a mix of traditional statistical methods and machine learning. 1. Univariate and multivariate logistic regression analyses: These traditional survival analysis methods were used to identify significant predictors of mortality. They provide explainable results and are used to construct nomograms. 2. The Boruta algorithm efficiently identifies truly meaningful features through a clever shadow variable competition mechanism and rigorous statistical validation, thereby enhancing model interpretability and generalizability. 3. SHAP value-driven recursive feature elimination (RFE-CV) uses SHAP values to quantify the contribution of features to the model output, recursively eliminates features with the lowest SHAP contributions, and directly optimizes feature subsets. Combined with cross-validation (CV) to repeatedly validate feature importance, this reduces randomness bias and ensures the robustness of the results. 4. LASSO regression analysis is particularly useful for datasets with many correlated variables. It performs variable selection by shrinking less important coefficients to zero, resulting in a more concise model. This method is beneficial for handling multicollinearity and improving model interpretability. 5. PCA offers a powerful unsupervised method with advantages in covariance structure, orthogonality, dimension reduction, noise/redundancy reduction, and visualization of high-dimensional relationships, making it commonly used to reveal and simplify the core relationship structure inherent in multivariate data.
By comparing these methods, we aim to select the optimal model that balances predictive accuracy and clinical interpretability. Dynamic nomograms are constructed using the most important predictive factors identified through these analyses.
First, in the training set, variables with statistically significant differences (P < 0.05) or those considered clinically meaningful between the death group and the survival group at baseline were identified through univariate analysis and included in the Boruta algorithm. Second, the variables not rejected by the Boruta algorithm were further analyzed and screened via the SHAP value-driven recursive feature elimination (RFE-CV) method. Finally, a multivariate logistic analysis combined with variance inflation factor (VIF) analysis was used to select all variables (inclusion criteria: P < 0.05 or VIF < 5) for constructing a dynamic scatter plot. Correlation and mediation analyses were performed on each variable, and ANOVA, AIC, and BIC analyses were used to compare the performance of the selected key variable model and the full variable model to select the optimal model. Unsupervised PCA was then performed on the above variables.
2.4. Verification of the dynamic nomogram
In the training and validation sets, the discriminatory performance of the APACHE II score was compared via AUC-based receiver operating characteristic (ROC) curves. Calibration curves were used to assess the accuracy of the model predictions. Decision curves were employed in the validation set to provide critical evidence of whether the model possesses practical clinical decision-making improvement capability to surpass its statistical performance.
2.5. Exploration of the relationships between key factors via principal component analysis (PCA)
Correlation analysis was performed on variables and mortality risk coefficients in the ROC curve model, and variables with strong correlations (r absolute value > 0.5) were included in the PCA to identify potential structures. After standardizing the predictive variables, the optimal number of principal components was selected on the basis of cumulative explained variance > 70% and eigenvalue magnitude. That is, when the cumulative variance approaches a critical threshold (e.g., 70–80%), 1. If removing components with eigenvalues slightly <1 causes a significant decrease in cumulative variance (>10%), the component should be retained. 2. If components with eigenvalues between 0.9 and 1.1 have no practical significance, they can be removed.
11
Logistic regression was used to assess the association between principal component scores and encephalopathy, with the results reported as adjusted odds ratios (per 1 standard deviation increase). Preliminary exploration of the relationships between key variables in the principal components was conducted on the basis of variables with absolute loadings >0.4 or clinical logical relationships.
2.6. Statistical analysis
Statistical analyses were performed in R (v4.3.3).
All the statistical tests were two-sided, with a statistical significance level set at a p value of < 0.05. The main packages R of the study design are gtsummary, tidyverse, dplyr, rms, dcurves, pROC, the nomogram Formula and ggplot2. All analyses were reported according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines. 12
3. Results
3.1. Clinical baseline information
This study ultimately included 192 patients with SFTS, among whom 58 (30.2%) developed viral encephalopathy. There were no significant differences between the training set (n=134) and the validation set (n=58) in terms of demographic characteristics, underlying conditions, or laboratory indicators, indicating balanced baseline characteristics between the two groups (Supplemental Figure 1, Supplemental Table 1). A comparison between the encephalitis group and the nonencephalitis group revealed that encephalitis patients had higher viral loads (VLs) (P<0.001), LDH levels (P<0.001), BNP levels (P<0.001), and IL-8 levels (P<0.001), whereas CD4+T-cell counts were significantly lower (P<0.001) (Supplemental Table 2).
3.2. Prediction model feature selection
To identify potential risk factors for encephalopathy in patients with severe fever with thrombocytopenia syndrome, we initially screened 30 candidate variables through univariate analysis coupled with the Boruta algorithm. This process classified 26 variables (including PLT and CD4+ T cells) as significant predictors, whereas 4 variables (CRP and CD8+ T cells) remained tentative, with others being excluded (Figure 1(a)). We then applied SHAP value-driven recursive feature elimination to refine the selection, ultimately identifying 9 optimal features (VL, urea, AST, etc.) that yielded peak cross-validation accuracy (0.886; Figure 1(c) and (d). Variance inflation factor analysis confirmed minimal multicollinearity among these variables (all VIFs <5; Figure 1(d)). Through LASSO regression modeling, we established the optimal predictive framework at λ.1 se=0.018, which retained 7 clinically relevant variables (VL, LDH, CD4+T, etc.; Figure 1(e) and (f)). These findings highlight VL, LDH, and CD4+T as pivotal biomarkers for encephalopathy risk stratification in this patient population. Machine learning-driven biomarker pipeline for predicting viral encephalopathy in SFTS patients.
3.3. Constructing models
Multivariate logistic regression analysis of risk factors for encephalopathy in fever with thrombocytopenia syndrome.
Notes. OR = odds ratio; CI = confidence interval; BNP and VL values were log-transformed. Bold font indicates statistically significant associations (P < 0.05). LDH = lactate dehydrogenase; BNP = B-type natriuretic peptide; VL = viral load; APTT = activated partial thromboplastin time; AST = aspartate aminotransferase; IL-8 = interleukin-8. Bold text indicates p-values less than 0.05.
To explore the probability of viral encephalopathy in patients with severe fever and thrombocytopenia syndrome, we developed a dynamic nomogram incorporating key biomarkers. For Patient A, with IL-8 at 27.33 pg/mL, AST at 564.9 U/L, APTT at 38.6 s, CD4+ T cells at 77 cells/μL, a viral load (VL) of 0.5 log10 copies/mL, BNP at 40.7 pg/mL, and LDH at 813 U/L, the model predicted a 34.9% probability of viral encephalopathy (Figure 2(a)). In contrast, Patient B presented a markedly greater risk at 80.8%, with parameters including IL-8 at 196.01 pg/mL, AST at 252.9 U/L, APTT at 39.4 s, CD4+ T cells at 86 cells/μL, VL at 1.0 log10 copies/mL, BNP at 199.4 pg/mL, and LDH at 1699 U/L (Figure 2(b)). The nomogram functions by summing the points from each variable to yield a total score, which is then converted into a probability estimate. This visualization highlights the critical role of IL-8, AST, APTT, CD4+ T-cell count, VL, BNP and LDH in modulating the risk of viral encephalopathy, offering a precise tool for clinical risk stratification. Dynamic nomogram for predicting viral encephalopathy probability in patients with severe fever with thrombocytopenia syndrome. Figures A and B show the predictions made by the predictive model for two different patients at different stages of the disease. Figure A shows a patient with an IL-8 level of 27.33 pg/mL, an AST level of 564.9 U/L, APTT of 38.6 s, CD4+ count of 77 cells/μL, VL of 0.5 log10 copies/mL, BNP of 40.7 pg/mL, and LDH of 813 U/L, with a 35% risk of in-hospital mortality. Figure B represents a patient with an IL-8 level of 27.33 pg/mL, an AST level of 252.9 U/L, APTT of 39.4 s, CD4+ count of 86 cells/μL, viral load of 1.0 log10 copies/mL, BNP of 199.4 pg/mL, and LDH of 813 U/L.
3.4. Dynamic nomogram model evaluation and validation
To evaluate the nomogram’s predictive accuracy for viral encephalopathy risk in patients with severe fever and thrombocytopenia syndrome, we compared its AUC against that of the APACHE II scoring system across training and validation datasets. In the training set, the nomogram demonstrated a superior AUC of 0.958 (95% CI: 0.930–0.987) compared with the APACHE II score of 0.839 (95% CI: 0.839–0.951, P = 0.04972) (Figure 3(a)). The validation performance further improved, with the nomogram achieving an AUC of 0.974 (95% CI: 0.924–1.000) versus the APACHE II score, which was 0.926 (95% CI: 0.860–0.991, P = 0.18) (Figure 3(b)). Comparative ROC analysis of the nomogram and APACHE II score for predicting viral encephalopathy risk in SFTS patients. ((a). Training set; (b). validation set).
Bootstrap optimism-corrected performance of the encephalopathy prediction nomogram.
These results establish the nomogram as a more precise tool for predicting the risk of viral encephalopathy than the conventional APACHE II scoring system does, particularly in the validation cohort. The calibration curves for both the training and validation sets revealed good agreement between the actual and predicted probabilities (Figure 4). The values of the Hosmer and Lemeshow goodness-of-fit (GOF) tests were 0.48 and 0.123 in the training and validation sets, respectively. The DCA results for the nomogram were depicted and compared with those of the nomogram score and APACHE II score. In the validation cohort, medical intervention guided by the nomogram yielded greater net clinical benefit than both the nomogram and the APACHE II score did at any threshold probability (Figure 5). Calibration plots for assessing model calibration performance in panel A (training set) and panel B (validation set). Decision curve analysis demonstrating the net benefit of distinct strategies as the high-risk threshold varies.

3.5. Based on the correlation between key variables and PCA principal component analysis
Correlation coefficients between variables and significance test results.
Mediation effect of lactate dehydrogenase (LDH).
To investigate the structural relationships among key biomarkers associated with encephalopathy in patients with severe fever and thrombocytopenia syndrome, we conducted principal component analysis (PCA), excluding the relatively independent biomarkers VL, CD4+ T, and urea. The results revealed that the first three principal components accounted for 78.8% of the total variance. PC1 explained 53.3% of the variance and primarily reflected tissue damage-related indicators, with LDH and AST contributing loadings of 0.46 and 0.45, respectively (Supplemental Figure 3C). PC2 accounted for 13.9% of the variance and represented inflammation- and coagulation function-related dimensions, with IL-8 and APTT contributing loadings of 0.22 and 0.41, respectively. PC3 explained 11.6% of the variance and was associated with coagulation and immune regulation, with CK contributing a loading of 0.47. On the basis that the cumulative variance explained exceeded 70%, combined with the magnitude of the eigenvalues, we retained the first three principal components (PC1--PC3). These findings elucidate potential structural relationships among different biomarkers in the development of encephalopathy, indicating that tissue damage, inflammation, and coagulation may represent key mechanisms in encephalopathy pathogenesis.
To visualize the distribution of key biomarkers associated with encephalopathy in patients with severe fever and thrombocytopenia syndrome, we generated PCA biplots (Supplemental Figure 4). The results revealed that nonencephalopathy samples (red dots) presented tighter clustering than did encephalopathy samples (green triangles) across PC1, PC2, and PC3. In the PC1 vs. PC2 biplot, AST and LDH showed strong positive correlations along PC1, whereas risk, APTT, IL-8, and BNP demonstrated higher loadings on PC2. The PC2 vs. PC3 biplot underscored the strong associations among risk, APTT, and IL-8, with CK showing significant positive effects along PC3. In the PC1 vs. PC3 biplot, Risk also displayed minimal vector angles with APTT and IL-8. Compared with other factors, APTT, IL-8, and CK presented the smallest vector angles relative to Risk, suggesting that they may be pivotal in encephalopathy pathogenesis.
Logistic regression analysis of encephalopathy risk associated with population substructure components (PC1-PC3).
Note. Values represent odds ratios (OR) with 95% confidence intervals (CI) and corresponding p-values. PC1 showed exceptionally strong association (OR=93.1, P=1.37×10-7), while PC2 demonstrated moderate association (OR=6.34, P=0.003). PC3 was not significantly associated with encephalopathy risk (P=0.705).

Threshold effect analysis of the effect of IL-8 on encephalopathy risk.
4. Discussion
A machine learning - driven nomogram incorporating seven common biomarkers (LDH, BNP, IL - 8, prolonged APTT, CD4+T - cell count, AST, and viral load) can accurately predict viral encephalopathy in patients with severe fever with thrombocytopenia syndrome (SFTS)13,14
4.1. From PCA axes to an integrated pathological model: Theoretical links with established biomarkers
Our PCA revealed three core pathological axes: tissue damage (PC1), inflammation-coagulation (PC2), and immune dysregulation (PC3). Although this study did not measure ferritin—a key biomarker repeatedly validated in the prediction of SFTS encephalitis and mortality15,16—our findings provide new insights into its central role. We propose a hypothetical integrative model: SFTSV infection may initially trigger widespread immune dysregulation (PC3), characterized by CD4+ T-cell depletion, leading to failure in viral control and disruption of immune homeostasis. This, in turn, drives an intense inflammatory response (PC2), in which factors such as IL-8 mediate neutrophil infiltration and endothelial damage, mutually exacerbating coagulation activation (prolonged APTT). Inflammation and direct viral damage jointly lead to multi-organ tissue necrosis (PC1, elevated LDH/AST). The consistently reported hyperferritinemia in the literature is likely a key manifestation and amplifier of this cascade; it serves as a marker of macrophage activation and cytokine storm, 17 and may also exacerbate the pathological processes of PC2 and PC1 through its pro-inflammatory properties. Therefore, the axes identified in this model are not isolated; they depict an upstream immune-inflammatory storm that may be common to SFTS encephalitis, with ferritin potentially serving as a serological hub connecting these axes. Future studies validating these findings in cohorts that simultaneously include ferritin and the biomarkers of this model would be of great value.
4.2. Key discoveries
4.2.1. Tissue injury
LDH is identified as the strongest independent predictor (
4.2.2. Cardiocerebral interaction
BNP has a strong association with encephalitis
4.2.3. Immune dysfunction
The study found that CD4+ T cells (OR = 0.55) act as a protective factor, which is consistent with previous findings that lymphocytopenia exacerbates the severity of SFTS, 21 providing direct clinical evidence for a key pathogenic mechanism of SFTSV. This finding is not merely a statistical association but aligns precisely with in vitro and pathological evidence. Li et al. 22 demonstrated that SFTSV induces T-cell apoptosis through the Fas/FasL pathway. Our observation that patients who develop encephalitis have significantly lower CD4+ T counts suggests that this virus-induced lymphopenia is clinically consequential. The depletion of these crucial immune cells likely creates a permissive environment for unchecked viral replication and diminishes the regulation of inflammatory responses, thereby facilitating the progression to severe neurological complications. This finding is consistent with the findings of Wang et al., namely, that the pathogenic mechanism of STFSV is the result of the combined effects of viral immune evasion and an excessive host inflammatory response. 23 The independence of CD4+ T cell count from viral load in our correlation analysis further implies that this immune dysregulation axis (PC3) contributes to encephalitis risk through mechanisms beyond direct viral cytopathy, possibly involving loss of immune homeostasis and failure to control the inflammatory cascade represented by PC2.
4.2.4. Synthesized pathophysiological model for SFTS-associated encephalopathy
Collectively, our biomarker analysis and PCA, when interpreted in the context of prior research, support a synergistic model for SFTS encephalitis. The process may be initiated by high viral load, leading to direct tissue injury (PC1: ↑LDH/AST) and the triggering of a pro-inflammatory state (PC2: ↑IL-8, ↑APTT). SFTSV simultaneously drives immune dysregulation (PC3: ↓CD4+ T cells) via apoptosis, impairing host defense and inflammatory control. These axes are not linear but interactive: tissue damage releases DAMPs that fuel inflammation, while the inflammatory and coagulopathic milieu (PC2) exacerbates endothelial and organ injury. Key biomarkers like ferritin rise as a consequence of this integrated storm. This framework moves beyond listing independent risk factors to proposing a networked pathophysiology, where the biomarkers identified in our nomogram are not just predictors but sentinels of interconnected biological processes. This mechanistic understanding underscores potential therapeutic targets, such as modulating the IL-8 pathway or exploring strategies to mitigate T-cell apoptosis.
4.2.5. IL - 8 threshold
IL - 8 shows a nonlinear threshold effect (cutoff 90.1 pg/mL), similar to what has been reported in COVID - 19 encephalitis. 24 When serum IL-8 exceeds the identified threshold of 90.1 pg/mL, we recommend immediate escalation of anti-inflammatory and immunomodulatory strategies to interrupt the neutrophil-driven cascade toward encephalitis. Clinically, this may include early initiation of moderate-to-high-dose corticosteroids (e.g., methylprednisolone 1–2 mg/kg/day for 3–5 days, tapered according to response), combined with close neurological monitoring (daily Glasgow Coma Scale and EEG if available) and consideration of ICU transfer if additional organ dysfunction emerges. In centers with access to experimental IL-8 pathway inhibitors or biologics, these could be evaluated under compassionate use. This threshold-guided approach mirrors successful biomarker-directed protocols in COVID-19 encephalitis and ARDS, where exceeding similar cytokine cutoffs prompted timely immunosuppression and improved neurological outcomes. By providing an actionable, serum-based trigger even in thrombocytopenic patients where lumbar puncture is contraindicated, the model offers practical decision support at the bedside and may reduce progression to severe encephalopathy.
4.2.6. Superiority over APACHE II
The nomogram demonstrates better discrimination (AUC 0.974 vs 0.926) and calibration (Hosmer–Lemeshow P = 0.123) compared to the APACHE II score. This provides evidence that disease - specific biomarker panels are more effective than generic severity scores, similar to findings in recent sepsis studies. 25
4.2.7. Comparison with existing prediction models for SFTS encephalitis
Several prediction models for SFTS-associated encephalopathy have recently been developed.
In comparison, our machine learning-driven nomogram incorporates seven readily available biomarkers (LDH, BNP, IL-8, prolonged APTT, CD4+ T-cell count, AST, and viral load) and employs advanced feature selection (Boruta + SHAP-RFE) plus PCA-derived mechanistic insights, achieving superior discrimination (AUC 0.974 vs. 0.895 in Wu et al.). Unlike Wu et al.’s model, which emphasizes demographic and ferritin-based risk stratification suitable for rapid bedside screening in resource-limited settings, our approach captures dynamic immune-inflammatory and tissue-injury axes (PC1–PC3), offering greater mechanistic interpretability and actionable thresholds (e.g., IL-8 > 90.1 pg/mL for immunomodulation). This makes our nomogram particularly applicable for precision risk stratification in tertiary centers where biomarker panels and dynamic monitoring are feasible, while remaining complementary to ferritin-inclusive models for broader validation.
Our model extends this work in several important directions. First, by incorporating CD4+ T cell counts, we directly capture the virus-induced immune dysregulation that is central to SFTS pathogenesis. 26 Second, the inclusion of IL-8—a key neutrophil chemokine—reflects the inflammatory cascade underlying encephalitis development. Third, our use of SHAP analysis enabled identification of nonlinear threshold effects (e.g., IL-8 > 90.1 pg/mL) that can guide clinical decision-making, such as when to consider intensified anti-inflammatory therapy.
In terms of predictive performance, our model’s AUC of 0.974 in the validation set is highly competitive. More importantly, we believe the mechanistic interpretability (via immune-inflammatory markers) and granular clinical guidance (via SHAP-derived thresholds) offered by our approach represent meaningful advances beyond predictive accuracy alone. Future multicenter studies are warranted to validate our findings and to assess whether the inclusion of these immune-inflammatory markers improves risk stratification and clinical outcomes in broader populations.
4.3. Study limitations
The study has some limitations. First, its single - center retrospective design limits generalizability, especially considering regional variations in SFTSV strains.15,27 The lack of serial VL measurements prevents the analysis of the temporal impact of viral kinetics on encephalitis risk. Moreover, the absence of cerebrospinal fluid biomarkers restricts the understanding of intrathecal inflammation.
24
In addition, this study has the following limitations: First, the impact of therapeutic interventions on the risk of encephalitis was not fully evaluated. Previous studies have shown that the combined use of IVIG (>80 g) and glucocorticoids can reduce the mortality rate of neurological complications associated with SFTS.15,28 Although this study employed a retrospective design, it did not address treatment confounding through propensity score matching or time-dependent analysis; validation in prospective cohorts is warranted. Second, the study did not examine the moderating effects of underlying conditions (e.g., hypertension). Hypertension has been established as an independent risk factor for SFTS encephalitis 15 and may influence the predictive performance of the model through vascular endothelial damage or amplified inflammation; although this study included some comorbidity data, no interaction analyses were conducted, and future studies could employ stratified analysis or machine learning models to further explore this. These limitations may restrict the generalizability of the predictive model but do not affect the internal validity of the current core findings.
4.4. Future research directions
Future research should focus on multicenter validation to enhance the model’s robustness. Exploring dynamic biomarker monitoring, such as through mHealth platforms, for real - time risk stratification is also crucial. Integrating multiomics data could reveal genotype - phenotype interactions related to encephalitis susceptibility. Interventional studies targeting IL - 8 or CD4+T - cell preservation may help validate therapeutic strategies. Although our model incorporates new immune and inflammatory biomarkers (CD4+ T cells, IL-8), direct comparisons of its performance with models based solely on conventional biomarkers (such as ferritin) are limited. Future studies should validate and compare this model in prospective, multicenter cohorts that include a more comprehensive set of biomarkers. Additionally, extending the PCA - guided mechanistic frameworks to other viral encephalitides may uncover universal pathophysiological axes and improve global management paradigms.
5. Conclusions
In conclusion, our study identifies three core pathophysiological axes in SFTS patients via PCA analysis—tissue injury (LDH/AST), inflammation-coagulation interaction (IL-8/APTT), and immune dysregulation—providing novel insights into disease progression and prognosis. By cross-discussing these axes with existing literature, including the roles of ferritin, LDH, IL-8, and the protective effects of CD4+ T cells in relation to SFTSV-induced apoptosis, we substantially deepen the mechanistic understanding. Moreover, the nonlinear threshold effect of IL-8 at a cutoff of 90.1 pg/mL offers substantial clinical translational potential for early risk stratification. These findings advocate for early biomarker monitoring to improve patient outcomes…particularly when IL-8 surpasses 90.1 pg/mL, enabling timely anti-inflammatory intervention. Furthermore, drawing on the rigorous validation framework and emphasis on clinical practicality in Tan et al. (OHCCPredictor: an online risk stratification model for predicting survival duration of older patients with hepatocellular carcinoma, Hepatol Int, 2024), 29 we have developed a corresponding offline calculator or simplified scoring tool for the nomogram proposed in this study (see Appendix). This would optimize its accessibility, facilitate rapid risk assessment within routine clinical workflows, and enhance its translational value, particularly in resource-limited settings.
Supplemental material
Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients
Supplemental material for Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients by Daguang Cui, Chenhu Ma, Dan Wang, Lingyan Xiao, Dongyang Shi, Hui Dong, Kai Yang, Yishan Zheng in DIGITAL HEALTH
Supplemental material
Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients
Supplemental material for Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients by Daguang Cui, Chenhu Ma, Dan Wang, Lingyan Xiao, Dongyang Shi, Hui Dong, Kai Yang, Yishan Zheng in DIGITAL HEALTH
Supplemental material
Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients
Supplemental material for Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients by Daguang Cui, Chenhu Ma, Dan Wang, Lingyan Xiao, Dongyang Shi, Hui Dong, Kai Yang, Yishan Zheng in DIGITAL HEALTH
Supplemental material
Supplemental material - Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients
Supplemental material for Machine learning-driven nomogram for predicting viral encephalopathy risk in SFTS patients by Daguang Cui, Chenhu Ma, Dan Wang, Lingyan Xiao, Dongyang Shi, Hui Dong, Kai Yang, Yishan Zheng in DIGITAL HEALTH
Footnotes
Ethical considerations
The study was approved by the Second Hospital of Nanjing ethics committees (2023-L-S-023). All procedures involving human participants were conducted in accordance with the ethical standards of the Declaration of Helsinki.
Consent to participate
Written informed consent was obtained from all individual participants included in the study.
Consent for publication
All authors confirm that written informed consent was obtained from the patients/participants (or their legal guardians if applicable) for the publication of their individual clinical data, images, or any other potentially identifiable information included in this manuscript. The data presented have been anonymized to protect patient privacy and comply with ethical standards. No identifiable details that could disclose the identity of participants are included in this publication.
Authors’ contributions
DGC, CHM and DW contributed equally as the first authors. YSZ, KY and HD contributed equally as the corresponding authors. DGC and KY contributed to writing the paper. HD, CHM and DW developed the methodology and contributed to revising the paper. LYX and DYS conducted the statistical analysis. YSZ provided financial support. The final manuscript received approval from all the authors.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Grants from Nanjing Second Hospital Reserve Talents (0316301), the Talent Lift Project of Nanjing Second Hospital (RCMS23010), the Natural Science Foundation Project of Nanjing University of Chinese Medicine (XZR2024043), the 2023 Nanjing Second Hospital Talent Support Project Grant (RCZD23003), Nanjing Infectious Disease Clinical Medical Center, Innovation Center for Infectious Diseases of Jiangsu Province (CXZX202232) were used.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Owing to patient privacy rights, we will not disclose the research data. If you have any questions or requests, please contact the corresponding author.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
