Abstract
Objective
This study aims to develop a prognosis prediction model and visualization system for acute paraquat poisoning based on an improved machine learning model.
Methods
101 patients with acute paraquat poisoning admitted to 6 hospitals from March 2020 to March 2022 were selected for this study. After expiry of the treatment period (one year of follow-up for survivors and up to the time of death for deceased patients) and they were categorized into the survival group (n = 37) and death group (n = 64). The biochemical indexes of the patients were analyzed, and a prognosis prediction model was constructed using HHO-XGBoost, an improved machine-learning algorithm. Multivariate logistic analysis was used to verify the value of the self-screening features in the model.
Results
Seven features were selected in the HHO-XGBoost model, including oral dose, serum creatinine, alanine aminotransferase (ALT), white blood cell (WBC) count, neutrophil count, urea nitrogen level, and thrombin time. Univariate analysis showed statistically significant differences between these features’ survival and death groups (P < 0.05). Multivariate logistic analysis identified four features significantly associated with prognosis- serum creatinine level, oral dose, ALT level, and WBC count - indicating their critical significance in predicting outcomes.
Conclusion
The HHO-XGBoost model based on machine learning is precious in constructing a prognosis prediction model and visualization system for acute paraquat poisoning, which can help clinical prognosis prediction of patients with paraquat poisoning.
Keywords
Introduction
Paraquat, a widely used herbicide, is known for its high toxicity, particularly affecting the lungs as the main target organ. This leads to pulmonary interstitial disease and interstitial pulmonary fibrosis, which often result in patient death. Despite the severity of paraquat poisoning, there is an acknowledged gap in the development of effective treatments, making it a persistent challenge in toxicology.1–3 The majority of paraquat poisoning cases result from oral ingestion, while some are caused by skin exposure and inhalation. Upon absorption into the human body, paraquat is rapidly distributed to various organs, with the lungs showing the highest concentration, which can be 10 to 90 times higher than other tissues and organs.4–7 This organ-specific vulnerability highlights the need for a targeted approach to understanding and treating paraquat poisoning. The continuous occurrence of paraquat poisoning has prompted extensive clinical and basic research globally.8–11 However, there is a clear gap in knowledge regarding the integration of advanced analytical techniques, such as machine learning, into the study of paraquat poisoning. Studies on the mechanism of paraquat poisoning and damage have revealed the association of oxidative stress, including the generation of reactive oxygen species (ROS) and other immune stimulatory effects, with tissue damage and intracellular calcium overload.12–14 Early diagnosis and treatment are identified as key factors in managing paraquat poisoning.15–17
In recent years, the field of medicine has seen the emergence of machine learning, an artificial intelligence algorithm that has gained prominence for its continuous organization, adaptation, and learning capabilities during training.18–20 Machine learning techniques have been effectively employed for multivariate regression modeling, discerning intricate non-linear classifications, and various other applications in medicine.21–22 However, the application of machine learning to the early detection and prognosis prediction of paraquat poisoning has been limited, representing a significant gap in the literature. One critical aspect of machine learning in medical applications is the explainability of the models. Explainability refers to the ability to understand and interpret the decisions made by machine learning algorithms. This is particularly important in clinical settings, where transparency and trust in the model's predictions are essential for adoption by healthcare professionals. Explainable machine learning models can provide insights into which features are most influential in predicting outcomes, thereby aiding clinicians in making informed decisions and understanding the underlying mechanisms of paraquat poisoning. 23
The objective of this study is to address this gap by investigating the development of a prognosis prediction model and visualization system for acute paraquat poisoning based on an enhanced machine learning approach. This work aims to facilitate clinical diagnosis and prognostic assessment of acute paraquat poisoning, potentially offering a novel perspective and tools to improve patient outcomes in this challenging area of toxicology.
Data and methods
Patients information
This study was retrospective. A total of 101 patients with acute paraquat poisoning were selected from 6 hospitals between March 2020 and March 2022. After expiry of the treatment period (one year of follow-up for survivors and up to the time of death for deceased patients) and they were categorized into the survival group (n = 37) and death group (n = 64). Prior to implementation, the study was approved by the Ethics committee of the six hospitals[Ethics committee of the 945th Hospital of the Joint Logistics Support Force of the Chinese People's Liberation Army (Ethical Application Ref: 202200015); Ethics committee of Mingshan District People's Hospital of Ya 'an (Ethics Review 2022 No. 18); Medical Ethics Committee of Ya'an Polytechnic College Affiliated Hospital (Ethics Review 2022 No. 023); Ethics committee of Yucheng District People's Hospital of Ya 'an (No. 202204032); Ethical review opinions of Ya 'an People's Hospital(No. 2022–0014); Ethics committee of Ya 'an Traditional Chinese Medicine Hospital (2022 No. 021)]. This study was a retrospective analysis and informed consent of patients could be waived (the Ethics committee of the six hospitals granted the waiver of patient informed consent), and this study was by the requirements and regulations of the World Medical Declaration of Helsinki.
Inclusion and exclusion criteria: Inclusion criteria: (1) All patients fulfilled the diagnostic criteria for paraquat poisonings, encompassing clinical presentation, patient history, laboratory tests (serum paraquat levels, urine paraquat test), imaging tests 24 ; (2) No patients with congenital immune dysfunction were included. (3) Age>18 years. Exclusion criteria: (1) Patients with excretory organ disorders; (2) Combined with other poisoning patients; (3) The poisoning time has exceeded 24 h.
Treatment protocol
All patients were lavaged with sodium bicarbonate solution (Sinopod Rongsheng Pharmaceutical Co., Ltd, Sinopod Approval number H20013007, size: 250 mL): 250 mL). Then 120–200 mL of 15% white clay was administered orally as an initial dose to bind the ingested paraquat and reduce absorption. Subsequently, the patient receives 50 ml of mannitol plus 6 g of rhubarb powder and 50 g of manganese (to induce diarrhea) orally, which is administered every 4 h for 24 h after gavage to facilitate clearance of the toxin from the gastrointestinal tract.
In addition, haemoperfusion and haemodialysis are part of the treatment regimen. Immunosuppressive agents were administered by injection, including cyclophosphamide (Zhejiang Hai Zheng Pharmaceutical Co., Ltd, State Drug Standard H20084627, Specification: 50 mg) at a dose of 1 mg (1 mg): 50 mg) at a dose of 50 mg every 24 h; and methylprednisolone (commonly used as a substitute for "Eugene") at a dose of 4 mg per kg of body weight once a day for the first 3 days, with dosage adjustments made according to the patient's clinical response and laboratory values to adjust the dose.
Data on alanine aminotransferase, blood creatinine, white blood cells, neutrophil count, blood urea nitrogen, and coagulation indices were collected at the earliest possible time after admission of a patient with paraquat intoxication (usually the first urgent investigation of a patient's emergency admission).
The detection methods for each index are as follows: ALT levels were determined using an automatic clinical biochemical analyzer after extracting 2–3 mL of venous blood from the body surface and centrifuging it to obtain serum supernatant. Serum creatinine and blood urea nitrogen concentrations were measured using a Pran blood gas analyzer after extracting 1–2 mL of venous blood from the body surface and adding it to routine EDTA anticoagulant test tubes or heparinized test tubes. Prothrombin time was determined using a blood coagulator method. WBC and neutrophils were detected using a TEK8520 automatic five-classification blood cell analyzer.
Statistical analysis
Construction of the HHO algorithm and the synchronous model: Synchronous optimization: In this study, the powerful global optimization capability of the swarm intelligence algorithm is utilized to synchronize the feature screening and hyper-parameter optimization steps in the machine learning model. The Harris-Hawkes optimization (HHO) algorithm is chosen because of its robust global search capability and the minimal parameter tuning required. Based on the basic algorithm, Tent Chaotic initialization is introduced to evenly distribute the initial population of Harris Hawks in the search space. In addition, the algorithm employs a lens imaging reverse learning strategy to improve the algorithm's ability to escape from local optima. As shown in Figure 1, the XGBoost algorithm's choice was based on optimizing the XGBoost model using HHO and constructing a hybrid HHO-XGBoost model. This model was compared to the traditional XGBoost model using 101 samples. Of these samples, 80% (n = 81) were randomly assigned to the training set, and the remaining 20% (20 cases) comprised the test set. To prevent overfitting, a 5-fold cross-validation was performed on the training set.

Schematic diagram of synchronous optimization.
Statistical analysis was performed using SPSS 26.0 software. Measurement data that followed a normal distribution were denoted as (xˉ ± s) and analyzed using the T-test. Count data were expressed as percentages (%) and analyzed using the χ2 or Fisher's exact probability test. A significance level of P < 0.05 was considered statistically significant.
Results
The two groups had no significant difference regarding gender, age, body mass index, pre-hospital care, (P>0.05). The poisoning-to-treatment time was longer and the duration of therapy was shorter in the death group than in the survival group, respectively (P < 0.05); the time to death in the death group was 2–24 h, with a mean of (11.33 ± 6.41 h); Serum paraquat concentration at admission was also significantly higher in the death group than in the survival group. Importantly, the number of HP(hemoperfusion)/CRRT(continuous renal replacement therapy) treatments was significantly higher in the survival group than in the death group, HP/CRRT is an important tool for the effective removal of serum paraquat toxin, which may also be closely related to whether or not the patients can survive (Table 1).
Clinical information of survival and death groups.
Analyzing the laboratory indices of the patients at the time of admission, it can be seen that most of the patients showed mild hypoxia at the time of admission, and the PaO2 of the survival group was slightly higher than that of the death group, and the Lac, BE were lower than that of the death group (P < 0.05), while the rest of the index did not show any significant difference at the time of admission (Table 2).
Laboratory indexes of survival and death groups.
The patients in the death group had increasing arterial blood lactate values and decreasing base residual values at admission, 12 h and 24 h, and the difference was statistically significant when compared with the survival group (P < 0.05) (Table 3).
Laboratory indexes of survival and death groups.
Improved HHO algorithm performance test
The performance of the improved HHO algorithm was evaluated using a set of 23 standard test functions. A comparative analysis was performed with other swarm intelligence optimization algorithms, including genetic algorithm (GA), particle swarm optimization (PSO), multivariate universe optimization algorithm (MVO), and sparrow optimization algorithm (SSA). The results show that the improved HHO algorithm has obvious performance advantages (Figure 2).

Search space and convergence curve of the base function.
Construction of synchronous optimization prediction model
The ROC-AUC and PR-AUC of the HHO-XGBoost model were 0.9433 and 0.9720 on the training set and 0.9167 and 0.9583 on the test set, significantly higher than the other three models. Ultimately, seven features were selected for the HHO-XGBoost model: oral dose, serum creatinine level, ghrelin level, white blood cell count, neutrophil count, urea nitrogen level, and prothrombin time. The training and validation results of the model are shown in Tables 4 and 5, respectively; Figures 3 and 4 depict these results.

The comparative performance of the model on the training set.

The comparative performance of the model on the test set.
Training set.
LR: Logistic Regression; SVM: Support Vector Machine; XGBoost: eXtreme Gradient Boosting; HHO-XGBoost: Harris-Hawkes optimization-eXtreme Gradient Boosting
Test set.
LR: Logistic Regression; SVM: Support Vector Machine; XGBoost: eXtreme Gradient Boosting; HHO-XGBoost: Harris-Hawkes optimization-eXtreme Gradient Boosting
Univariate and multivariate analysis of predictive indicators
HHO-XGBoost screening for the seven features revealed that the difference between the surviving and dead groups was statistically significant (p < 0.05), indicating that the variables screened by HHO-XGBoost were of critical significance (Table 6). Multivariate analysis was performed using stepwise logistic regression with four essential characteristics included in the equation: serum creatinine, oral dose, ghrelin and leukocytes. These variables were identified as critical variables by HHO-XGBoost screening (Table 7).
Univariate analysis of survival and death groups.
ALT: Alanine Aminotransferase; WBC: White Blood Cell.
Multivariate analysis of survival and death groups.
ALT: Alanine Aminotransferase; WBC: White Blood Cell.
Visualization system construction
As shown in Figures 5 and 6, the visualization system can directly output the risk of patient death after inputting patient-related indicators.

The low-risk outcome of death.

The high-risk outcome of death.
Discussion
The widespread use of paraquat is globally recognized for its efficacy, cost-effectiveness, and minimal environmental impact. However, paraquat poisoning has a profound impact on daily life, and studies continue to report severe organ damage, especially to the lungs, which progresses to pulmonary fibrosis and asphyxia, with high mortality rates even in conscious patients.25–28 This alarming statistic highlights the urgent need for improved diagnostic and therapeutic strategies, and it is this gap that our study seeks to fill. Our findings contribute to existing research by emphasizing the urgency of timely evaluation and risk assessment of patients with paraquat poisoning, essential for accurate diagnosis, effective treatment, and integration of public healthcare services with social welfare measures. 29 The gold standard for diagnosing paraquat poisoning is to detect the presence of paraquat or its metabolites in biological samples, but this approach was not used in our study, suggesting that this is an area for future exploration.30,31
Applying machine learning to medical prognosis opens new avenues for predictive modelling. Our study used the XGBoost algorithm, a leading open-source tool known for its computational efficiency and ability to manage large datasets, thus overcoming manual processing limitations and improving clinical prediction accuracy the limitations of manual processing and improving accuracy the accuracy of clinical predictions.32–36 This approach is critical given the lack of effective treatment for paraquat poisoning and the high mortality rate.
The HHO-XGBoost model used in our study identified seven key characteristics significantly associated with the prognosis of paraquat poisoning, including oral dose, serum creatinine, alanine aminotransferase, white blood cell count, neutrophil count, urea nitrogen, and prothrombin time. These findings are consistent with previous studies that showed a correlation between the ingested dose of paraquat and the prognosis of patients, with higher ingested doses associated with higher mortality. 37 In addition, our study confirmed the importance of serum creatinine as an indicator of renal function and its relationship with prognosis, which has also been emphasized in the literature. 38 The role of paraquat in inducing oxidative stress and inflammation has been well documented, and our findings on ALT and leukocyte levels further support the existing understanding of the molecular mechanisms of paraquat and its effect on cytokine levels.39,40 Disruption of the organism's REDOX system and the subsequent inflammatory response play a crucial role in the development of acute lung injury and pulmonary fibrosis, and therefore, monitoring these biomarkers is essential for patient management.
Conclusion
In acute paraquat poisoning, the HHO-XGBoost model based on machine learning is of great significance for developing prognostic prediction models and visualization systems, which can help provide vital help in predicting the clinical prognosis of paraquat poisoning patients. Inputting the relevant test indexes of patients within 24 h in the visualization system can provide a risk assessment of mortality, which can provide valuable help for clinicians to guide the follow-up treatment. However, due to the retrospective design, this study has some limitations regarding sample representativeness. However, the limited sample size of this study poses a challenge in accurately representing the entire target group. In addition, the prognosis of patients with paraquat poisoning may be affected by other factors not considered in the observational indicators used in this study, which may affect the interpretation and extrapolation of the findings. Future studies could take the following steps to improve the experimental design. First, we suggest adopting a more comprehensive approach to sample selection and considering increasing the sample size to improve the representativeness and generalizability of the findings.
In addition, future research efforts should cover a broader range of indicators and variables to provide a comprehensive understanding of the research topic. This expansion could produce more precise results and facilitate in-depth reasoning. Finally, implementing prospective studies to enhance the control of variables improves the management of the dynamics of observations and strengthens causal inferences.
Strengths and limitations of this study
Strengths of this study
Innovative Approach: The study adopts an improved machine learning model for constructing a prognosis prediction model and visualization system for acute paraquat poisoning, showcasing innovation in applying advanced techniques in the medical field. Prognosis Prediction: By utilizing machine learning, the study offers a promising method for predicting the prognosis of acute paraquat poisoning, which can potentially assist healthcare professionals in making timely and informed decisions for patient care. Visualization System: Developing a visualization system enhances the accessibility and usability of the prognosis prediction model, providing a user-friendly interface for medical practitioners to interpret and utilize the predictive outcomes effectively. Potential for Clinical Impact: Implementing this model and system can improve patient outcomes by facilitating early identification of severe cases and guiding appropriate treatment strategies, thereby contributing to better management of acute paraquat poisoning.
Limitations of this study
Data Limitations: The study's effectiveness may be constrained by the data availability and quality used for training the machine learning model. Insufficient or biased data could impact the accuracy and generalizability of the predictions. Model Interpretability: Machine learning models, incredibly complex ones, may lack interpretability, making it challenging for healthcare providers to understand the underlying reasons for specific prognostic predictions, which could hinder the model's adoption in clinical practice. Validation and External Generalization: The study's findings may require further validation and external generalization across different healthcare settings or populations to ensure the reliability and robustness of the prognosis prediction model for acute paraquat poisoning.
Footnotes
Availability of data and materials
All data generated or analyzed during this study are included in this published article [and its supplementary information files].
Consent for publication
This study was a retrospective analysis and informed consent from patients could be waived. The study was approved by the ethics committee of the six hospitals.
Contributorship
MZ: Formulation of overarching research goals and aims; Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team. XH: Management and coordination are responsible for planning and executing the research activity. ZZ, TH, PW, YX, LZ, ZL, ZX, HL, XY, PH: Conducting a research and investigation process, specifically data collection. LL: Write the initial draft (including substantive translation); Program; develop software; design computer programs; implement computer code and supporting algorithms; test existing code components; and apply statistical, mathematical, and computational methods to analyze data. All authors read and approved the final manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
The study was approved by the Ethics committee of the six hospitals[Ethics committee of the 945th Hospital of the Joint Logistics Support Force of the Chinese People's Liberation Army(Ethical Application Ref: 202200015); Ethics committee of Mingshan District People's Hospital of Ya 'an(Ethics Review 2022 No. 18); Medical Ethics Committee of Ya'an Polytechnic College Affiliated Hospital(Ethics Review 2022 No. 023); Ethics committee of Yucheng District People's Hospital of Ya 'an (No. 202204032); Ethical review opinions of Ya 'an People's Hospital(No. 2022–0014); Ethics committee of Ya 'an Traditional Chinese Medicine Hospital(2022 No. 021)].
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
