Abstract
Objective
We tested the performance of general machine learning and joint machine learning algorithms in the classification of bone metastasis, in patients with lung adenocarcinoma.
Methods
We used R version 3.5.3 for statistical analysis of the general information, and Python to construct machine learning models.
Results
We first used the average classifiers of the 4 machine learning algorithms to rank the features and the results showed that race, sex, whether they had surgery and marriage were the first 4 factors affecting bone metastasis. Machine learning results in the training group: for area under the curve (AUC), except for RF and LR, the AUC values of all machine learning classifiers were greater than .8, but the joint algorithm did not improve the AUC for any single machine learning algorithm. Among the results related to accuracy and precision, the accuracy of other machine learning classifiers except the RF algorithm was higher than 70%, and only the precision of the LGBM algorithm was higher than 70%. Machine learning results in the test group: Similarly, for areas under the curve (AUC), except RF and LR, the AUC values for all machine learning classifiers were greater than .8, but the joint algorithm did not improve the AUC value for any single machine learning algorithm. For accuracy, except for the RF algorithm, the accuracy of other machine learning classifiers was higher than 70%. The highest precision for the LGBM algorithm was .675.
Conclusion
The results of this concept verification study show that machine learning algorithm classifiers can distinguish the bone metastasis of patients with lung cancer. This will provide a new research idea for the future use of non-invasive technology to identify bone metastasis in lungcancer. However, more prospective multicenter cohort studies are needed.
Introduction
Lung cancer has become 1 of the most common malignant tumors in the world, with high incidence and mortality. 1 The skeletal system is the most common metastasis site of lung cancer. Bone metastasis seriously affects the treatment and prognosis of patients with lung cancer. 2 Studies have shown that 15%∼40% of patients have bone metastasis when lung cancer is first diagnosed. 3 Some studies have also suggested that the prevalence of bone marrow micro metastasis in patients with lung cancer is 22%∼60%. 4 Moreover, the latest study reports that the bone metastasis rate for lung cancer patients in some areas has reached 48% for stage IV non-small cell lung cancer and 40% for extensive stage small cell lung cancer. 5 The regional difference in bone metastasis incidence is not only correlated with disease stage, histologic type, survival time, treatment and other factors, but is also associated with diagnostic methods. Thus, it would be of great significance to diagnose bone metastasis in the early stages, and provide timely treatment. However, the current diagnosis of bone metastasis in patients with lung cancer relies on regular bone imaging screening for patients with lung cancer; Then, suspicious patients are further diagnosed by means of imaging or pathology. 6 A study has shown that the 2 examination methods can complement each other, and bone scans can be used for general examinations. Meanwhile, MRI can locate the lesions that are easy to miss in a bone scan, especially the isolated metastatic lesions and the metastatic lesions in the spine. 7 A study has shown that the diagnostic rate and lesion detection rate of PET/CT in patients with bone metastasis from lung cancer were much higher than those of MRI and SPECT. 8 In addition to PET/CT and MRI, the clinical application of ultrasound is maturing.9,10 The use of ultrasound provides a rapid and less invasive method of diagnosis and staging for lung cancer. 11 Moreover, chest ultrasound can also be used to detect rib metastases from non-small cell lung cancer. 12
However, these radiologic examinations are prone to high false positive rates and high expenses. Additionally, many medical facilities lack proper equipment. Therefore, there is an urgent need to identify a new effective, economical and routine detection method to screen for bone metastasis in patients with lung cancer in the initial stages.
Machine learning technology can improve the diagnosis and prediction efficacy in the field of clinical medicine. Studies have shown that the prognosis of patients receiving chemotherapy can be predicted by machine learning. 13 Other studies have shown that the immune microenvironment of prostate cancer can predict disease progression. 14 Other studies have shown that novel autophagy-related lncRNA can predict the prognosis of colonic adenocarcinoma. 15 Other studies have shown that the complement system is closely related to the coagulation cascade and endometriosis. 16 Studies have shown that machine learning methods can distinguish people who are allergic to food by epigenetic biomarkers. 17 Other studies have shown that machine learning can predict and diagnose hemodynamic instability in surgery patients by physiological waveforms and electronic health records. 18 Additionally, it has been reported that a new machine learning technique for accurate diagnosis of coronary artery disease has been developed. Moreover, studies have shown that machine learning algorithms based on medical data can diagnose patients with ankylosing spondylitis. 19 In addition, the current research on artificial intelligence-related bone metastasis has focused on radiologic data with small sample sizes.20–22
Therefore, we explored 9 machine learning algorithms (including general machine learning algorithms and joint machine learning algorithms) to distinguish bone metastasis among elderly stage IV lung cancer patients.
Methods
General Information
27 627 patients with advanced lung cancer from the Surveillance, Epidemiology, and End Results database (SEER) between January 2010 and December 2015 were included in this study, and they were divided into a bone metastasis group (11 147 cases) and non-bone metastasis group (16 480 cases). Inclusion criteria were elderly patients diagnosed with stage IV lung cancer (age greater than or equal to 60 years old) (seventh edition of UICC/AJCC clinical stage); and exclusion criteria were patients with undefined staging, pathological features, or indicators. This study was exempt from review by the local institutional review board because it was a secondary data analysis of the SEER public database, and because the information in this database was anonymous and publicly available for relevant medical research worldwide.
Research Methods and Research Contents
The following clinicopathological data from patients diagnosed with stage IV lung adenocarcinoma from 2010 to 2015 were collected from the SEER database using SEER * Stat 8.3.5: sex, race, age, marital status, diagnostic age, primary location, histological grade, T stage (seventh edition of UICC/AJCC TNM stage), N stage, M stage, surgical status, tumor location, radiotherapy and chemotherapy, metastasis of liver cancer, metastasis of lung cancer, brain metastasis and bone metastasis.
Machine Learning Method
Nine machine learning algorithms (Logical regression-LR, Random forest-RF, Gradient Boosting Decision Tree-GBDT, XGBoost-XGB, LightGBM- LGBM, RF + LR, LGBM + LR, GBDT + LR, XGB + LR) were used. Logistic regression is a classical statistical learning classification method, and the basic model can be used for two-class learning. Random forest is a classifier with multiple decision trees, and its output categories are determined by the modes of the categories output by individual trees. Gradient Boosting Decision Tree is an iterative decision tree algorithm, which consists of multiple decision trees, and the conclusions of all trees are added up for the final answer. XGBoost is a lifting tree model, which integrates many tree models to form a strong classifier. Lightgbm (lightgradient boosting machine) is an open source framework for gradient lifting, and 1 of the frameworks for implementing the GBDT algorithm which supports efficient parallel training.
The study patients were divided into training and testing groups at a ratio of 7:3, and we adopted the 5 cross-verification scheme. We used the training group to train the machine learning model, and then tested the verification model’s performance in the test group. To adjust the parameters, we conducted manual parameter adjustment and grid search. In addition, in order to simplify the results of machine learning classifiers, we used 4 machine learning classifiers to calculate and quantify each feature’s ranking. All of the data were normalized, and we used the accuracy, precision, recall rate and AUC (area under the curve) value to evaluate the machine learning model’s performance. Accuracy is the ratio of correct predictions to all predictions. The precision rate is the rate that is actually labeled a among all predicted a’s. The recall rate is the predicted number of correct positive samples divided by the number of all positive samples (for positive samples). AUC is the area under the ROC curve.
Statistical Analysis
We used R version 3.5.3 for statistical analysis of the general information, and the categorical variables were expressed with count and percentage. The quantitative variables were expressed with mean ± standard deviation. We conducted a Student’s t-test with statistical significance set at
Results
General Information
Clinical Characteristics of Patients with Lung Cancer.
Note: P > .05 indicated that there was no significant difference. .01 < P < .05 indicated significant difference and was marked with *; P < .01 indicated the most significant difference and was marked with * *.
Correlation and Feature Ranking Analysis
M stage, metastasis and tumor size were significantly correlated with bone metastasis. There was a weak negative correlation between being female and radiotherapy and bone metastasis (Figure 1). We first used the average classifiers of the 4 machine learning algorithms to rank the features. The results showed that race, sex, whether they had had surgery and marital status were the first 4 factors affecting bone metastasis (Figure 2). Correlation between Clinical Characteristics Data. Ranking Results for Bone Metastasis Feature Weights of Average Algorithm.

Machine Learning Results in the Training Group
Model Results for Training Group.
Abbreviations: (Logical regression-LR, Random forest-RF; Gradient Boosting Decision Tree-GBDT, XGBoost-XGB; LightGBM- LGBM, RF + LR; LGBM + LR, GBDT + LR; XGB + LR).

Machine Learning Algorithm Results for Training Group for Bone Metastasis Abbreviations: (Logical regression-LR, Random forest-RF, Gradient Boosting Decision Tree-GBDT, XGBoost-XGB, LightGBM- LGBM, RF + LR, LGBM + LR, GBDT + LR, XGB + LR).
Machine Learning Results in the Test Group
Model Results for Testing Group.
Abbreviations: (Logical regression-LR, Random forest-RF; Gradient Boosting Decision Tree-GBDT, XGBoost-XGB; LightGBM- LGBM, RF + LR; LGBM + LR, GBDT + LR; XGB + LR).

Machine Learning Algorithm Results for Testing Group for Bone Metastasis Abbreviations: (Logical regression-LR, Random forest-RF, Gradient Boosting Decision Tree-GBDT, XGBoost-XGB, LightGBM- LGBM, RF + LR, LGBM + LR, GBDT + LR, XGB + LR).
Discussion
As of this writing, lung cancer remains a form of malignant tumor with high incidence and mortality throughout the world, and 50% −70% of patients suffer bone metastasis complications. 23 The overall prognosis of patients with bone metastasis lung cancer is still poor, and the incidence of lung cancer with bone metastasis remains on the rise. 24 Some lung cancer patients with bone metastases may not show any clinical symptoms, and the diagnosis relies primarily on radiologic examinations, such as CT, ECT and MRI. These diagnostics are not only expensive, but there is radiation risk, and low sensitivity and specificity. They also cannot dynamically monitor changes in bone metabolism. This results in delayed diagnosis and treatment of bone metastasis.25–27 Therefore, there is an urgent need to establish a new early warning system to indicate the risk and presence of bone metastasis, so as to promote the prevention and treatment of lung cancer. We rank the features based on the differentiation performance of the average classifiers of the 4 machine learning algorithms. The results show that race, surgery condition, whether they had surgery and marriage were the 4 factors most affecting bone metastasis. Moreover, machine learning algorithm classifiers can distinguish bone metastasis from lung cancer, but the joint algorithm cannot improve the performance of a single machine learning algorithm.
Race, sex and marital factors are associated with bone metastasis in cancer patients. However, the effect of race on patient prognosis with non-small cell lung cancer is still controversial.28,29 Studies have shown that race is associated with bone metastasis in patients with bladder cancer. 30 Studies have also shown strong associations between sex and race and bone metastasis in patients with nasopharyngeal carcinoma. 31 Other studies have shown that race does not predict bone metastasis in men with non-metastatic castrated prostate cancer. 32 Additional studies have shown that women have a lower risk of bone metastasis and good prognosis in non-sex-specific cancers. 33 Additionally, it has also been shown that there is a positive correlation between being male and lymphoid metastasis and bone metastasis of lung cancer. 34 In addition, studies have shown that the independent risk factors for BM in patients with liver cancer are sex, T stage and N phase. 35 Moreover, studies have shown that sex and marriage are independent risk factors for brain metastasis in patients with melanoma. 36 Our results also indicate that race and sex influence bone metastasis in patients with lung cancer.
The prognostic benefits of surgical treatment for patients with advanced lung cancer are still controversial. 37 Previous studies based on SEER program analysis have shown that further surgical treatment of patients with advanced lung cancer is not recommended. 38 However, some studies have shown that the quality of life and 5-year survival rate of stage IV NSCLC patients undergoing pneumonectomy and extended chest wall resection have been improved. 39 In addition, studies have shown that surgery may improve the quality of life and survival rate of patients with bone metastasis from gastric cancer. 40 Our study has shown that surgery is 1 of the main factors affecting bone metastasis in patients with lung cancer.
The advantages and disadvantages of machine learning and other examination tools (such as MRI/CT/ultrasound) in distinguishing the bone metastasis of lung cancer include the following: Advantages: Compared with other tools, the machine learning tool is less invasive for patients, the cost for large-scale screening in the general population is lower, and the environmental requirements for testing are lower; Disadvantages: At present, the clinical application of machine learning in this field remains immature, and its application value in the evaluation of therapeutic effects on bone metastases and dynamic monitoring of disease progression remains unclear, thus this warrants further research.
This study has several limitations. First, the factors above limit the use of other factors in the SEER database. Only a limited amount of information can be extracted from the SEER database, so it is impossible to use multimodal data to construct bone metastasis models. Moreover, it is not possible to include more specific details on bone metastasis if there is no relevant data. Finally, the sequence of different metastatic sites could not be determined. Multicenter prospective cohort study is needed in the future.
Conclusion
The results of this concept verification study show that machine learning algorithm classifiers can distinguish the bone metastasis of patients with lung cancer. This will provide a new research direction for identifying bone metastasis of lung cancer by non-invasive technology in the future. However, more prospective multicenter cohort studies are required. In addition, the performance of these algorithms needs further improvement.
Footnotes
Acknowledgments
None.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Availability of Data and Information
Ethical Approval and Consent to Participate
This study was exempt from review by the local institutional review board because it was a secondary data analysis of the SEER public database, and because the information in this database was anonymous and publicly available for relevant medical research worldwide.
