Abstract
Objective:
Over 1 million new cases of hepatocellular carcinoma (HCC) are diagnosed worldwide every year. Its prognosis remains poor, and the 5-year survival rate in all disease stages is estimated to be between 10% and 20%. Radiofrequency ablation (RFA) has become an important local treatment for liver cancer, and machine learning (ML) can provide many shortcuts for liver cancer medical research. Therefore, we explore the role of ML in predicting the total mortality of liver cancer patients undergoing RFA.
Methods:
This study is a secondary analysis of public database data from 578 liver cancer patients. We used Python for ML to establish the prognosis model.
Results:
The results showed that the 5 most important factors were platelet count (PLT), Alpha-fetoprotein (AFP), age, tumor size, and total bilirubin, respectively. Results of the total death model for liver cancer patients in test group: among the 5 algorithm models, the highest accuracy rate was that of gbm (0.681), followed by the Logistic algorithm (0.672); among the 5 algorithms, area under the curve (AUC) values, from high to low, were Logistic (0.738), DecisionTree (0.723), gbm (0.717), GradientBoosting (0.714), and Forest (0.693); Among the 5 algorithms, gbm had the highest precision rate (0.721), followed by the Logistic algorithm (0.714). Among the 5 algorithms, DecisionTree had the highest recall rate (0.642), followed by the GradientBoosting algorithm (0.571).
Conclusion:
Machine learning can predict total death after RFA in liver cancer patients. Therefore, ML research has great potential for both personalized treatment and prognosis of liver cancer.
Keywords
Introduction
Hepatocellular carcinoma (HCC) is the second-most deadly form of cancer in China, killing approximately 300 000 people every year. This accounts for 55% of liver cancer deaths worldwide. Surgical resection is one of the most effective treatments for liver cancer, but only 10% to 20% of patients are candidates for these treatments. 1 In recent years, with the development of minimally invasive and interventional techniques, radiofrequency ablation (RFA) has become an important local treatment for liver cancer because it is safe and involves minimal trauma. 2 It is also recognized as an effective surgical treatment for early HCC among patients with Barcelona Clinic Liver Cancer (BCLC) who do not meet the requirements. However, due to the high risk of liver cancer recurrence, the 5-year survival rate for HCC patients with tumors 5 cm in diameter before RFA is only 50%. 3 Moreover, targeted preventive measures for patients with liver cancer and patient stratification schemes for treatment algorithms customized to the specific conditions of patients with liver cancer can improve their prognosis, 4 such as application of liquid biopsy and multiparametric analysis in liver malignancy management. The economic and medical burden caused by liver cancer is also great, and patients with liver cancer need more effective treatment methods to prolong their survival times and improve their quality of life. Therefore, an effective prognostic evaluation tool for liver cancer patients after RFA is needed.
Machine learning (ML) provides another method for standard predictive modeling, which can solve the current limitations, and by making better use of “big data” for algorithm development, it has the potential to revolutionize medicine.5-7 At present, ML is providing many shortcuts for medical research. 8 For example, ML methods can provide accurate hospital stay predictions for patients with heart disease. This can be used for clinical bed management and resource allocation 9 ; studies have shown that ML can predict acute kidney injury after liver cancer resection 10 ; ML can also predict persistent depressive symptoms in older adults. 11 Similarly, studies have shown that ML can provide individualized patient profile analysis, and build prediction models in the palliative treatment of malignant tumors in the liver. 12
At present, there are few studies on the application of ML among liver cancer patients undergoing RFA. Therefore, this study assesses 5 ML algorithms’ ability to predict the prognosis of liver cancer patients undergoing RFA.
Methods
Patients
Our study is a secondary analysis of public database data from the BioStudies (https://www.ebi.ac.uk/biostudies/studies/S-EPMC6059486) (public) database. It includes 611 HCC patients who had been treated with RFA between January 2006 and December 2010. Finally, the study covers 578 enrolled patients whose ferritin serum levels were measured on admission before RFA. Radiofrequency ablation is typically performed in patients with 3 or fewer lesions less than 3 cm in diameter. The following variables and serum ferritin levels were used: sex, body mass index (BMI), aspartate aminotransferase (AST) level, age, hepatitis C antibody positive, platelet count (PLT), hepatitis B surface antigen positive, alcohol consumption, number of tumors, hemoglobin (HB), Child-Pugh classification, tumor size, and Alpha-fetoprotein (AFP) level.
Categorical variables were compared by Fisher’s exact test, while continuous variables were compared by 1-way analysis of variance (parametric) or Kruskal–Wallis test (nonparametric). Survival follow-up ended on December 31, 2015. The difference was considered statistically significant when P < .05. We performed ML modeling with python, and Pearson’s correlation analysis. The 5-ML model was performed with the following models: random forest, decision tree, Logistic regression, LightGBM, and gradient boosted decision trees (GBDT). Eighty percent of the data were divided into training groups for development, and 20% were verified by the test groups. See Appendix 1 Table 4 for the parameters used in ML in this study.
Results
Comparison of each basic index between the 2 groups: There was no significant difference in age between the nondeath group and the death group, nor between the training group and the test group (P = .001 and P = .013). Moreover, the platelet differences between the nondeath group and the death group in the training group and the test group were statistically significant (P < .001 and P = .049), and the rest are shown in Table 1.
The results of correlation analysis showed that age, tumor size, and AFP were proportional to the postoperative death of patients with liver cancer receiving RFA, and PLT was inversely proportional to the death outcome (Figure 1). These results showed that the 5 most important factors were PLT, AFP, age, tumor size, and total bilirubin, respectively (Figure 2).
Training group’s effect on total postoperative death model for liver cancer patients: Among the 5 algorithm models, forest’s accuracy was the highest (0.900), followed by the GradientBoosting algorithm (0.831); among the 5 algorithms, the AUC values were, from high to low, Forest (0.971), Gradient Boosting (0.914), gbm (0.825), DecisionTree (0.748), and Logistic (0.739). Among the 5 algorithms, the highest precision rate was that of forest (0.904), followed by the GradientBoosting algorithm (0.795). Among the 5 algorithms, Forest had the highest recall rate (0.887), followed by GradientBoosting (0.874) (See Table 2 and Figure 3).
Results of the total death model for liver cancer patients in the test group: Among the 5 algorithm models, the highest accuracy rate was that of gbm (0.681), followed by the Logistic algorithm (0.672); among the 5 algorithms, the AUC values, from high to low, were Logistic (0.738), DecisionTree (0.723), gbm (0.717), GradientBoosting (0.714), and Forest (0.693); among the 5 algorithms, gbm had the highest precision rate (0.721), followed by the Logistic algorithm (0.714). Among the 5 algorithms, DecisionTree had the highest recall rate (0.642), followed by the GradientBoosting algorithm (0.571; see Table 3 and Figure 4).
Basic patient characteristics.
Abbreviations: AFP, alpha-fetoprotein; anti-HCVAb, anti-hepatitis C virus antibody; AST, aspartate aminotransferase; HBsAg, hepatitis B surface antigen; ALT, alanine aminotransferase; PTINR,prothrombin time-international normalized ratio.

Factor correlations.

Variable importance of features included in the machine-learning algorithm for predicting postoperative death outcomes.
Forecasted results for the training group.
Abbreviation: AUC, area under the curve.

Machine-learning algorithm predictions of postoperative death outcomes in the training group.
Forecasted results for the testing group.
Abbreviation: AUC, area under the curve.

Machine-learning algorithm predictions of postoperative death outcomes in the test group.
Discussion
Radiofrequency ablation is considered the most effective first-line percutaneous ablation therapy. 13 Survival results for patients completely relieved by RFA have been shown to be equivalent to those of patients treated by hepatectomy, 14 with a 5-year recurrence rate that may be as high as 80%. 15 In clinical courses with frequent recurrence and retreatment, tumors have tended to be out of control, which has been the primary reason for the low long-term survival rates after ablation. 16 The results of our study showed that the 5 most important factors were PLT, AFP, age, tumor size, and total bilirubin. The results of the correlation analysis showed that age, tumor size, AFP, and death after RFA for liver cancer were proportional. In addition, PLT was inversely proportional to death outcome. After internal validation, all 5-ML algorithms could better predict the prognosis of liver cancer patients undergoing RFA.
Albumin-bilirubin grade has been introduced to assess liver function in liver cancer patients. 17 In HCC patients undergoing transarterial chemoembolizatio (TACE), the serum prealbumin-bilirubin score (PALBI) grades were predictive of postoperative overall survival. 18 Similarly, albumin-bilirubin grade could predict survival in HCC patients undergoing TACE. 19 Furthermore, PLT predicts functional recovery and complications after hepatectomy. 20 The results of this study also suggest that total bilirubin and platelets are important risk factors for the prognosis of liver cancer patients undergoing RFA.
Anemia is a risk factor for death among liver cancer patients. 21 Hemoglobin changes have also been shown to be associated with overall survival for a variety of malignancies, including lung, breast, colorectal, and liver cancers. This has also been reported in a cohort study of breast, colorectal, and liver cancer. 22 Also, the addition of AFP and ascites to the BCLC staging classification may improve prognostic predictors of early and mid-term liver cancer. 23 This finding is also supported by this study.
Patients with tumors >2 cm have consistently shown that the survival benefits of RFA exceed those of percutaneous ethanol injection (PEI).16,24 Researchers have agreed that when a patient’s tumor is ⩽2 cm, RFA is advantageous in liver cancer treatment. 25 The results of our study also suggest that tumor size is an important factor affecting the prognosis of liver cancer patients undergoing RFA.
Machine learning is instrumental for the multiomics and multiparametric analysis essential to improving the overall management of liver malignancies and individual outcomes. In the next step of ML and liver cancer research, ML schemes based on biomarkers of chronic inflammation may be a useful tool in prediction and prevention. 26 Moreover, to facilitate the interpretability of ML in liver cancer research, it would be of great significance and scientific value to incorporate multigroup learning. 27
Our study does have several limitations. First, since this was a retrospective study, it was impossible to collect all of the data from the HCC patients. This may have biased the model effect. Second, data on molecular factors was lacking. This may have influenced patient prognosis. Finally, this study model has only been internally validated. Thus, further prospective and multicenter studies are needed in the future.
Conclusion
Our results show that each factor derived from the ML gbm algorithm accounts for the weight of death. This indicates that the 5 most important factors, ranked in order, were PLT, AFP, age, tumor size, and total bilirubin. Furthermore, ML algorithms can improve prognosis prediction for liver cancer patients undergoing RFA.
In sum, prevention, prediction, and individuation are of great significance to the prognosis of patients with liver cancer. 28 As such, ML research has great potential for personalized treatment and prognosis of liver cancer.
Footnotes
Appendix 1
Functions, packages, and tuning parameters used in Anaconda for each machine learning algorithm.
| Algorithm | Classifier | Package | Tuning parameters |
|---|---|---|---|
| Logistic regression | LogisticRegression | from sklearn.linear_model import LogisticRegression | penalty =‘l2’, tol = 0.000001, C = 0.1, fit_intercept = True,intercept_scaling = 1, class_weight = None,max_iter = 100, multi_class =‘ovr’,verbose = 0, warm_start = False,n_jobs = 1 |
| DecisionTree | DecisionTreeClassifier | from sklearn.tree import DecisionTreeClassifier | splitter =‘best’, max_depth = 3, min_samples_split = 30, min_samples_leaf = 2, min_weight_fraction_leaf = 0.01 |
| forest | RandomForestClassifier | from sklearn.ensemble import RandomForestClassifier | n_estimators = 50, n_jobs = -1, min_samples_split = 20, min_samples_leaf = 2, random_state = 41 |
| GradientBoosting | GradientBoostinglassifier | from sklearn.ensemble import GradientBoostinglassifier | learning_rate = 0.2, n_estimators = 20, max_depth = 3, min_samples_split = 20, min_samples_leaf = 5 |
| gbm | lgb.LGBMClassifier | lightgbm 2.2.0 | boosting_type =‘gbdt’, objective =‘binary’,metrics =‘auc’,learning_rate = 0.1, n_estimators = 100, max_depth = 2, bagging_fraction = 0.5, feature_fraction = 0.5 |
Acknowledgements
The authors are also grateful to the public BioStudies Database for including and providing Professor Tateishi’s original data. Koji U, Ryosuke T, Ryo N, et al. Serum levels of ferritin do not affect the prognosis of patients with hepatocellular carcinoma undergoing radiofrequency ablation. PLoS ONE. 2018;13:e0200943.
Funding:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
All authors contributed to this study’s conceptualization. C.-M.Z. and J.-J.Y. contributed to project administration work. All authors contributed to writing, review, and editing the work.
Availability of Data and Materials
Data are available at the BioStudies Database, accession number: S-EPMC6059486.
Ethics Approval and Consent to Participate
Our study did not require the approval of an ethics committee, as it was a secondary analysis of the BioStudies public database which is open to the public.
