Abstract
Background:
Febrile neutropenia (FN) frequently occurs as a complication among patients with diffuse large B-cell lymphoma (DLBCL) receiving their initial rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) therapy. This study aimed to identify the risk factors for FN and create a reliable predictive model for FN using machine learning methods.
Methods:
This retrospective study evaluated 238 patients newly diagnosed with DLBCL and treated with the R-CHOP regimen. Logistic regression was used to identify the risk factors for FN. In addition, a machine learning model was developed to predict the occurrence of FN.
Results:
The incidence rate of FN was 23.9%. Univariate analysis revealed significant associations between FN and bone marrow involvement (odds ratio [OR], 2.78; 95% confidence interval [CI], 1.36-5.66; P = .005), stage III and IV disease (OR, 3.39; 95% CI, 1.69-6.84; P = .001), Eastern Cooperative Oncology Group Performance Status score of ⩾ 2 (OR, 2.75; 95% CI, 1.13-6.67; P = .025), lactate dehydrogenase levels ⩾ 240 U/L (OR, 2.68; 95% CI, 1.35-5.32; P = .005), and involvement of at least 2 extranodal sites (OR, 2.69; 95% CI, 1.44-5.02; P = .002). Machine learning techniques were applied to construct predictive models for FN, achieving C-statistics of 0.751 to 0.879 in cross-validation and 0.692 to 0.861 in independent training and testing experiments. Notably, patients with FN (57.6%) had a substantially inferior 5-year overall survival (OS) rate than those without FN (77.1%) (P = .007).
Conclusions:
Patients with DLBCL who develop FN after their first R-CHOP treatment have a significantly worse OS than those without FN. Tailored prophylaxis with granulocyte-colony stimulating factor and antibiotics may be essential in this population. Models with moderate to strong predictive power can be designed using various artificial intelligence techniques.
Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most common non-Hodgkin lymphoma (NHL), comprising 30% to 40% of cases worldwide.1-3 Diffuse large B-cell lymphoma is characterized by the rapid growth of large lymphocytes in the lymph nodes and marked biological heterogeneity, leading to variable clinical outcomes and a need for individualized therapy. Advances in molecular profiling classify DLBCL into germinal center B-cell–like and activated B-cell–like subtypes with distinct prognoses and treatment responses, reinforcing tailored treatment and supporting risk-adapted management. 4 Rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisolone (R-CHOP) remains the standard regimen and has significantly improved overall survival (OS). 5 For relapsed or refractory disease, chimeric antigen receptor T-cell therapy and bispecific T-cell engagers have shown encouraging activity, offering hope for patients with this aggressive lymphoma.6,7
Febrile neutropenia (FN) is a major complication of R-CHOP in DLBCL. In the work by Zheng et al, 8 20.4% of newly diagnosed patients developed FN during cycle 1. Febrile neutropenia not only increases the risk of severe infection but also leads to hospitalizations, treatment delays, dose reductions, and higher costs. Therefore, early risk stratification is essential. Reported risk factors include female sex, higher average relative dose, advanced age, and hypoalbuminemia.8,9 Immune-impairing comorbidities further raise risk; for example, FN-related admissions are more frequent in patients with diabetes mellitus than in those without. 10 Effective management requires careful monitoring, appropriate antimicrobial prophylaxis, and supportive care to reduce serious complications and mortality.
Machine learning (ML), a subset of artificial intelligence, has made significant strides in medicine, revolutionizing diagnosis, treatment, and outcome prediction across diseases, with notable applications in medical imaging, genomics, and drug discovery. 11 The integration of ML in medicine is also promising for enhancing individualized medicine. By analyzing large amounts of patient data, including genetic information, lifestyle factors, and medical history, ML models can predict patient responses to various treatments, thereby optimizing health care outcomes. Cho et al 12 reported that compared with traditional statistical models, ML more accurately predicts chemotherapy-related FN in breast cancer. However, the effectiveness of ML in predicting FN for patients with DLBCL remains uncertain. As such, this study aimed to identify the risk factors for FN and create a reliable predictive model for FN using ML methods.
To this end, we analyzed the data of patients newly diagnosed with DLBCL undergoing their initial cycle of R-CHOP treatment. In addition, ML techniques were used to design predictive models for identifying patients at higher risk of FN after their initial R-CHOP therapy.
Materials and Methods
Study design and patients
This retrospective study was approved by the Institutional Review and was conducted according to the tenets of the Declaration of Helsinki. The need for informed consent was waived owing to the retrospective nature of the research. The reporting of this study adheres to the Strengthening the Reporting of Observational Studies in Epidemiology Statement. 13
A total of 283 patients who were newly diagnosed with DLBCL who received first cycle of R-CHOP chemotherapy at Taichung Veterans General Hospital, Taiwan between 2015 and 2020 were evaluated. Among them, patients with human immunodeficiency virus (n = 6), aged < 18 years (n = 1), with another type of solid organ cancer or hematologic malignancy prior to this study (n = 16), with primary central nervous system DLBCL (n = 8), with incomplete diagnoses (n = 5), and with lymphoproliferative diseases following solid organ transplantation (n = 1) were excluded. Ultimately, 238 patients met all eligibility criteria and were included in the analysis (Figure 1). The patients were followed up for a median of 37.80 months (range, 0.57-90.37 months). Follow-up was terminated upon death or on August 31, 2022, whichever came first.

Study flowchart.
Treatment protocol
The standard R-CHOP regimen consisted of rituximab (375 mg/m2), cyclophosphamide (750 mg/m2), doxorubicin (50 mg/m2), vincristine (1.4 mg/m2), and prednisolone (40 mg/m2 for 5 days). Primary antibacterial prophylaxis was not administered owing to local antimicrobial resistance concerns. Baseline complete blood counts (CBCs) were obtained before each cycle and CBC was routinely monitored every other day from day 7 until neutrophil recovery. Granulocyte-colony stimulating factor (G-CSF) was administered when the leukocyte count decreased below 1 × 109/L or the absolute neutrophil count (ANC) decreased below 0.5 × 109/L. Granulocyte-colony stimulating factor treatment was discontinued once neutropenia resolved, defined as a leukocyte count exceeding 2 × 109/L or an ANC above 1 × 109/L. Due to institution protocol, none of the 238 patients received G-CSF prophylaxis prior to the onset of neutropenia. All patients who experienced FN in cycle 1 (57/238, 23.9%) received secondary G-CSF prophylaxis in the following cycles.
Diagnosis and management of FN
Febrile neutropenia was defined as an ANC below 0.5 × 109/L, accompanied by a fever reaching at least 38.3°C or maintaining 38.0°C for over 1 hour. 14 With respect to pathogen identification and treatment of FN, patients underwent blood culture tests from two distinct sites when signs of FN were observed. Following the onset of FN, empirical antibiotics were administered within 60 minutes, in accordance with the current consensus guidelines. 15 For patients who had a positive bacterial culture, their medical records were thoroughly reviewed by internal medicine specialists. The type of bacterial species and the primary site of infection were documented. Samples suspected of contamination were rigorously excluded by the reviewers.
Data collection
Logistic regression analysis was performed to identify potential risk factors of FN. The risk factors examined include age, sex, bone marrow involvement, Eastern Cooperative Oncology Group Performance Status (ECOG-PS) score, 16 involvement of extranodal sites, Lugano stage, 17 and lactate dehydrogenase (LDH) level at the time of diagnosis.
Statistical analysis
The sample size was calculated based on bone marrow involvement proportion, which was reported to be able to significantly discriminate FN from non-FN patients, 9 with type I error: 0.05, power: 0.90, and non-FN/FN ratio: 3, resulting in a minimum sample size of 51 and 151 patients for FN and non-FN cohorts, respectively. In this study, after excluding patients not meeting the inclusion criteria, a total of 57 FN and 181 non-FN patients were included for statistical analysis and predictive model design.
Categorical variables were analyzed using the χ2 test, whereas continuous variables were evaluated using the Mann-Whitney U test and Fisher exact test as appropriate. Overall survival was calculated using the Kaplan-Meier method and compared between the FN and non-FN groups using the log-rank test. Logistic regression and Cox proportional hazards models were employed to identify factors associated with FN and mortality. Variables with >5% missing values were analyzed using available-case analysis. The findings were reported as odds ratios (ORs) and hazard ratios (HRs) with their respective 95% confidence intervals (CIs). All statistical analyses were carried out using SPSS for Windows, version 26.0. A P value of <.05 was considered statistically significant.
Predictive model design
To confirm the risk factors for FN identified from the univariate analysis, we employed ML techniques to develop predictive models. Given the significant imbalance in our raw data, where the minority class (FN cohort) was substantially smaller than the majority class (non-FN cohort), we used the Synthetic Minority Over-sampling Technique (SMOTE) to equalize the number of patients in both classes by upsampling the minority class. 18 Subsequently, beyond logistic regression, additional ML methods such as random forest (RF), K-nearest neighbor (KNN), and support vector machine (SVM) were incorporated to construct the predictive models on the Python platform. Tenfold cross-validation was employed for both training and validating these models. The predictive performance of the model was evaluated according to their sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (AUC).
Results
Patient characteristics
Febrile neutropenia occurred in 57 patients, yielding an incidence rate of 23.9%. Compared with the non-FN group (n = 181), the FN group (n = 57) involved a higher proportion of patients with stage III and IV DLBCL (79.0% vs 52.5%; P < .001), with bone marrow involvement (31.5% vs 14.2%; P = .004), with elevated serum LDH levels (627.16 ± 946.34 vs 423.38 ± 512.80; P < .001), and with an ECOG-PS score of ⩾2 (17.5% vs 7.2%; P = .021) (Table 1).
Patient characteristics.
Abbreviations: FN: febrile neutropenia; SD: standard deviation; R-IPI: revised international prognostic index; ECOG: Eastern Cooperative Oncology Group; LDH: lactate dehydrogenase.
Mann-Whitney U test.
χ2 test.
Poorly controlled diabetes was defined as HbA
Fisher exact test.
Chronic kidney disease was defined as an eGFR < 60 mL/min/1.73 m2.
Risk factors of FN
Univariate analysis revealed that the following factors were significantly associated with FN: bone marrow involvement (OR, 2.78; 95% CI, 1.36-5.66; P = .005), advanced disease stage (ie, Lugano stage III and IV disease) (OR, 3.39; 95% CI, 1.69-6.84; P = .001), ECOG PS score of ⩾2 (OR, 2.75; 95% CI, 1.13-6.67; P = .025), elevated LDH levels (⩾240 U/L) (OR, 2.68; 95% CI, 1.35-5.32; P = .005), and involvement of 2 or more extranodal sites (OR, 2.69; 95% CI, 1.44-5.02; P = .002). However, none of these factors were independently associated with FN in the multivariate analysis (Table 2).
Risk factors for febrile neutropenia.
Abbreviations: ECOG: Eastern Cooperative Oncology Group; LDH: lactate dehydrogenase; OR: odds ratio; CI: confidence interval.
Pathogen identification during FN
Among the 57 patients who experienced FN, 24 (42.1%) patients showed positive microbiological findings, with only 12 (21.1%) confirmed cases of bacteremia. Most bloodstream infections were attributed to gram-negative bacilli (92.3%, n = 12). Escherichia coli (n = 3) and Pseudomonas aeruginosa (n = 4) were the predominant pathogens identified. Only a single case of bacteremia was caused by gram-positive cocci (n = 1). Notably, one patient concurrently presented with bacteremia caused by Pseudomonas putida and Enterococcus cloacae complex (Figure 2).

Among the 57 patients experiencing febrile neutropenia, 24 (42.1%) patients show positive microbiological findings, with only 12 (21.1%) patients confirmed to have bacteremia. Most bloodstream infections are attributable to gram-negative bacilli, accounting for 92.3% (n = 12) of the cases. Escherichia coli (n = 3) and Pseudomonas aeruginosa (n = 4) are the predominant pathogens identified. Only a single case of bacteremia is caused by gram-positive cocci (n = 1).
Effect of FN on OS
The FN group had significantly inferior OS than the non-FN group (Figure 3). The 5-year OS rates for the FN and non-FN groups were 77.1% and 57.6%, respectively (P = .007). Univariate analysis confirmed that FN (HR, 2.04; 95% CI, 1.20-3.49; P = .009), age ⩾60 years (HR, 2.01; 95% CI, 1.16-3.47; P = .013), bone marrow involvement (HR, 1.99; 95% CI, 1.12-3.55; P = .020), Lugano stage III and IV disease (HR, 3.48; 95% CI, 1.84-6.57; P < .001), ECOG-PS score of ⩾2 (HR, 3.62; 95% CI, 1.92-6.83; P < .001), elevated LDH levels (⩾ 240 U/L) (HR, 2.97; 95% CI, 1.57-5.59; P = .001), and involvement of 2 or more extranodal sites (HR, 1.72; 95% CI, 1.02-2.90; P = .043) were significantly associated with mortality (Table 3).

The 5-year overall survival rate for patients with and without febrile neutropenia are 77.1% and 57.6%, respectively (P = .007).
Risk factors for mortality.
Abbreviations: ECOG: Eastern Cooperative Oncology Group; LDH: lactate dehydrogenase; HR: hazard ratio; CI: confidence interval.
Multivariate analysis showed that age ⩾60 years (HR, 1.80; 95% CI, 1.02-3.18; P = .043), stages III and IV (HR, 2.23; 95% CI, 1.06-4.69; P = .034), and ECOG-PS score of ⩾2 (HR, 2.56; 95% CI, 1.29-5.08; P = .007) were independently associated with a higher risk of mortality. Febrile neutropenia (HR, 1.39; 95% CI, 0.78-2.45; P = .263) was not significantly associated with mortality in this analysis (Table 3).
ML for FN prediction
To improve the prediction of FN in patients with DLBCL receiving their initial R-CHOP treatment, we developed ML models based on 6 clinical variables identified through univariate analysis and Cox proportional hazards models, namely, age, bone marrow involvement, Lugano stage III and IV disease, ECOG-PS score of ⩾2, LDH level, and extranodal involvement of at least 2 sites. Supplemental Table 1 displays the predictive capabilities of the developed models. In cross-validation, the AUCs (C-statistics) were 0.752 (95% CI, 0.702-0.802) for logistic regression, 0.879 (95% CI, 0.843-0.915) for random forest, 0.774 (95% CI, 0.726-0.822) for KNN, and 0.839 (95% CI, 0.798-0.880) for SVM. In the independent training and testing experiment, the corresponding values were 0.775 (95% CI, 0.727-0.823), 0.861 (95% CI, 0.822-0.900), 0.692 (95% CI, 0.638-0.746), and 0.813 (95% CI, 0.769-0.857) (Supplemental Table 1 and Supplemental Figure 1). These results suggest that predictive models with moderate to strong performance can be developed using 6 identified clinical variables (Figure 4).

The area under curve (AUC) for the logistic regression, random forest, K-nearest neighbor, and support vector machine models are 0.7518, 0.8792, 0.7739, and 0.8391, respectively.
Discussion
In our cohort, 23.94% of newly diagnosed patients with DLBCL developed FN during R-CHOP therapy. Independent risk factors were age ⩾60 years, bone marrow involvement, Lugano stage III-IV, ECOG-PS ⩾2, LDH ⩾240 U/L, and ⩾2 extranodal sites. Although FN did not independently predict poorer OS by multivariate analysis, patients experiencing FN after their first R-CHOP treatment had a significantly worse OS than those without FN. Machine learning techniques were employed to develop a reliable predictive model for FN.
Febrile neutropenia is a potentially fatal complication of myelosuppressive chemotherapy. The incidence rate of FN following R-CHOP chemotherapy in patients with NHL ranges from 19% to 22%,8,19,20 consistent with our cohort. Kim et al 21 reported that prophylactic G-CSF reduces FN but has not improved overall mortality. Moreover, antibiotic prophylaxis in DLBCL remains controversial due to antimicrobial resistance concerns. Hence, identifying risk factors for FN in patients with DLBCL undergoing chemotherapy is essential to guide selective use of G-CSF and antibiotics.
The current study revealed that patients with advanced stage, elevated LDH, bone marrow involvement, involvement of 2 or more extranodal sites, and poor ECOG-PS had higher FN risk during the first R-CHOP cycle. Furthermore, patients who developed FN in first cycle had significantly lower 5-year OS rate. However, multivariate analysis did not yield the same result, suggesting that the factors influencing FN are complex. These findings are consistent with the revised International Prognostic Index, wherein poorer prognosis correlates with higher scores. 22
Different factors have been reported to be associated with FN in DLBCL. Zheng et al 8 identified age ⩾65 years, albumin <3.5 g/dL, and bone marrow involvement as independent predictors. Similarly, Choi et al 9 reported that age, sex, comorbid conditions, stage, International Prognostic Index score, bone marrow involvement, and low serum albumin levels were risk factors for FN. These reports partially validate our findings. In addition, an average relative dose intensity of R-CHOP exceeding 80% is associated with an increased incidence of FN. 8 Given the complexity of FN prediction in DLBCL, artificial intelligence and ML may help address this gap.
Artificial intelligence and ML are reshaping health care by enabling predictive models for diagnosis and treatment. However, class imbalance is common in medical datasets, with rare but clinically critical cases far fewer than typical cases.23,24 In our dataset, the majority-to-minority ratio was 3.18, comparable to prior reports.8,19,20 Synthetic Minority Over-sampling Technique is commonly used to up-sample the minority class in medical prediction models. 25
In this study, to prevent bias toward the non-FN cohort, we used SMOTE to up-sample FN cases to match non-FN cases. The results indicated that robust models could be developed using RF (AUC = 0.818) and KNN (AUC = 0.818), while models developed with logistic regression (AUC = 0.786) and SVM (AUC = 0.774) were moderately effective. This suggests that the 5 extracted risk factors were valuable for designing a predictive model for FN. Although age was not a significant factor (P = .08) for distinguishing between the FN and non-FN patients, including age along with the 5 extracted factors in the predictive models improved their performance, with AUCs of 0.879 for RF and 0.839 for SVM. This suggests that filter methods, such as univariate or multivariate analysis, may be suboptimal. Instead, wrapper or embedded methods may better identify effective variable combinations. 26
Regarding microbiological evidence in FN, a prior retrospective study found positive results in only 10% of events. 9 In our study, 24/57 (42.1%) FN cases had positive microbiology, including 12 (21.1%) with confirmed bacteremia. Notably, gram-negative bacteria were identified in 92.3% of the patients with bacteremia, a rate higher than the 57% reported by Chen et al. 27 These results suggest that the pathogens associated with FN may vary by region and evolve over time. Surveillance of infectious epidemiology is crucial and requires regular updates across different institutions.
This study has some limitations. First, the use of retrospective data from medical records led to less-than-optimal assessment of infection focus in patients with positive microbiological evidence. Second, predictive models developed solely from our hospital dataset may have introduced bias and may not be generalizable to patients in other health care settings. External validation using independent datasets is required to confirm the robustness of the proposed model. Moreover, predictive performance could be enhanced by increasing the sample size to achieve a more balanced distribution of FN and non-FN cases and by applying wrapper or embedded methods for variable selection. This approach could lead to more effective ML models for predicting FN in patients with DLBCL.
Conclusion
The incidence rate of FN in patients recently diagnosed with DLBCL undergoing R-CHOP therapy is 23.9%, with gram-negative bacteria remaining the predominant pathogen. The potential risk factors for FN include age ⩾60 years, bone marrow involvement, Lugano stage III and IV, an ECOG-PS score ⩾2, elevated LDH levels (⩾240 U/L), and the presence of extranodal involvement at 2 or more sites. Machine learning techniques provide a reliable predictive model for FN. Finally, patients who experience FN after their first R-CHOP treatment have significantly worse OS than those who did not. Thus, tailored prophylaxis with G-CSF and antibiotics is essential for patients with DLBCL receiving their first R-CHOP treatment.
Supplemental Material
sj-docx-1-onc-10.1177_11795549261423995 – Supplemental material for Risk Factors for Febrile Neutropenia in Patients With Newly Diagnosed Diffuse Large B-Cell Lymphoma Undergoing Initial R-CHOP Treatment
Supplemental material, sj-docx-1-onc-10.1177_11795549261423995 for Risk Factors for Febrile Neutropenia in Patients With Newly Diagnosed Diffuse Large B-Cell Lymphoma Undergoing Initial R-CHOP Treatment by Liang-Ying Chen, Che-An Tsai, Po-Wei Liao, Ling-Chiao Teng, Cheng-Hsien Lin, Yu-Chen Su and Chieh-Lin Jerry Teng in Clinical Medicine Insights: Oncology
Footnotes
Acknowledgements
The authors would like to express their appreciation to Dr Hsuan-Hung Lin for his assistance in designing the AI predictive models.
Ethical considerations
Participant details were de-identified, and study was conducted according to the current version of the Declaration of Helsinki. The Institutional Review Board of Taichung Veterans General Hospital approved the study (CE2298A).
Consent to participate
Written informed consent was waived by the Institutional Review Board of Taichung Veterans General Hospital (CE2298A) due to the retrospective nature of the study.
Consent for publication
All authors agreed to submit the manuscript for review and publication in Clinical Medicine Insights: Oncology.
Author contributions
Liang-Ying Chen: Conceptualization, Investigation, Formal analysis. Che-An Tsai: Conceptualization, Methodology, Investigation, Writing—review & editing. Po-Wei Liao: Methodology, Investigation. Ling-Chiao Teng: Conceptualization, Writing—review & editing. Cheng-Hsien Lin: Investigation, Formal analysis. Yu-Chen Su: Investigation, Formal analysis. Chieh-Lin Jerry Teng: Conceptualization, Supervision, Writing—review & editing. All authors gave final approval of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially supported by grants from National Science and Technology Council (NSTC 112-2314-B-005-011) and Taichung Veterans General Hospital (TCVGH-YM1130113).
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Chieh-Lin Jerry Teng received honorarium and consulting fees from Novartis, Roche, Pfizer, Takeda, Johnson and Johnson, Amgen, BMS Celgene, Kirin, and MSD. The other authors have no conflicts of interest.
Data availability statement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
