Abstract
Purpose
This study aimed to conduct a large-scale population-based study to understand the epidemiological characteristics of Primary Malignant Bone Tumors (PMBTs) and determine the prognostic factors by concurrently using the classical statistical method and data mining methods.
Methods
Patients included in this study were extracted from the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) database: “Incidence-SEER Research Data, 18 Registries, Nov 2020 Sub”. Patients with unclassified and incomplete information were excluded. This search algorithm resulted in a dataset comprising 6234 cases. Survival analyses were performed with Kaplan-Meier curves and the Log-rank test. Multivariate Cox regression analysis determined the independent prognostic factors of PMBT. A decision tree-based data mining technique was used in this study to confirm the prognostic factors.
Results
5-years survival rate was 63.6% and 10-years survival rate was 55.3% in the patients with PMBT. Sex, age, median household income, histology, primary site, grade, stage, metastasis, and the total number of malignant tumors were determined as independent risk factors associated with overall survival (OS) in the multivariate COX regression analysis. The prognostic factors resulting in five terminal nodes in the decision tree (DT) included stage, age, and grade. The stage was the most important determining factor for vital status. The terminal node with the shortest number of surviving patients included 801 (72.3%) deaths in 1102 patients with distant stage, and hazard ratio was calculated as 5.4 (95% CI: 4.9–5.9;
Conclusions
Rules extracted from DTs provide information about risk factors in specific patient groups and can be used by clinicians making decisions on individual patients. We recommend using DTs in combination with COX regression analysis to determine risk factors and the effect of these factors on survival.
Introduction
Metastatic tumors that have spread to bones from other parts of the body are the most common type of bone malignancies. Primary malignant bone tumors (PMBTs) are rare and often aggressive tumors, that constitute less than 1% of all cancers in adults.1,2 Osteosarcoma, chondrosarcoma, and Ewing’s sarcoma are the most frequently diagnosed PMBTs. 3 The management of PMBTs can be complex in terms of diagnosis, follow-up, and treatment. These patients should be managed in tertiary centers with orthopaedics, radiology, pathology, oncology, radiation oncology and nuclear medicine departments. The management of PMBT has recently improved dramatically. Nowadays, most patients undergo limb-sparing procedures, and survival rates have improved significantly.
Data mining techniques, including neural networks, decision trees (DTs), artificial intelligence, machine learning, and genetic algorithms, are used to uncover previously unknown relationships and patterns between variables in large volumes of data, which may not be apparent through classical statistical methods that are primarily used to test specific hypotheses.4–6 Decision trees create a flow chart-like model, starting from a root node and proceeding through multiple internal nodes until it reaches the leaf nodes, and can handle both discrete and continuous variables, missing values, and skewed continuous data by dividing it into ranges. 7 Additionally, Decision trees are easy to understand, unambiguous, and can be compared with traditional statistical techniques in medicine.5,6 Using decision tree-based data mining, we can identify critical features in a dataset that significantly impact PMBT prognosis.
This study aimed to conduct a large-scale population-based study to understand the epidemiological characteristics of PMBTs and determine the prognostic factors by concurrently using the classical statistical method and decision tree-based data mining methods.
Material and methods
National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) Program [http://seer.cancer.gov/], is a cancer registry covering 47.9% of the USA population. 8 providing a limited-use data set for further analyses by researchers and the public. 9 SEER reports patients’ demographic data and cancer incidence as well as chances of survival at a case-by-case level. Histological tumor type and tumor localization are currently classified in version 3 of the ICD code for Oncology (ICD-O-3).10,11 The SEER research contract was signed online to access the database. All patient identifiers and all personal data in this study were removed from the SEER database. As a result, studies using the SEER database are exempt from institutional review board approval. 12 No ethics committee approval nor patient approval forms are necessary to use data on the SEER platform. SEER*stat software (SEER*Stat Version 8.3.9.2) is used to retrieve data from SEER, the software can be downloaded from http://seer.cancer.gov/seerstat/website.
Selection of the study cohort and data collection
Overview of 14,604 primary malignant bone tumors from the 2000–2018 SEER data (2020 release).
ICD-O-3: international classification of diseases for oncology (ICD-O), 3rd ed.

Flow chart of patient selection from the SEER database.
Statistical analysis
The classical statistical methods
Statistical analysis was performed by SPSS 22.0 (IBM Corp, Armonk, NY, USA) package. Categorical variables were presented as numbers and percentages, and continuous variables were reported as mean ± standard deviation (SD) and median (interquartile range: IQR). Chi-square tests were used to compare categorical variables between groups. The normal distribution of continuous variables was assessed visually (histogram and probability graphs) and analytically (Kolmogorov-Smirnov/Shapiro-Wilk tests). The Kruskal–Wallis test was used to compare non-normally distributed variables. Survival analyses were performed with Kaplan-Meier curves and the Log-rank test. Variables with
Decision tree analysis: Recursive partitioning analysis (RPA)
Variables for recursive partitioning analysis.
Results
Characteristics of the study population
In this study, the median age of the patients was 42 (19–61) years, and 55.8% of cases were male. The most commonly observed histopathological types in PMBT were osteosarcoma (41%) and chondrosarcoma (40.2%). Based on Figure 2, which illustrates the distribution of histopathological types among different age groups, Osteosarcoma was the most frequently observed type in individuals under the age of 15, accounting for 77.4% of cases under the age of 15. Whereas, the most frequently observed histopathological types in individuals aged 15-44 were osteosarcoma (50.7%) and chondrosarcoma (32.4%). The most common PMBT in groups aged 45 and above was chondrosarcoma (Figure 2). Distribution of histological types by age groups.
Clinical characteristics and survival outcomes of patients.
SD: standard deviation, IQR: interquartile range (25P–75P).
Survival analysis results using classical statistical method
Overall survival rates according to factors.
NA: not available.
aSince the median survival time could not be reached, the mean ± standard error is presented.

(a) Kaplan-Meier curves for overall survival, (b) Kaplan-Meier curves for overall survival according to histology.
Multivariate COX regression analysis for overall survival (OS) for patients identified in the SEER program database.
Survival analysis results using decision-tree model
In the DT analysis, we identified the variables that play important roles in explaining vital status (Table 5). The stage was the most important determining factor for vital status. This first-level split produced the two initial branches of the classification tree: the localized and regional stage versus the distant stage. DT identified the following five nodes (groups) having different levels of survival possibility (Figure 4): (Node 1) localized/regional stage and < 65 ages; (Node 2) localized/regional stage, grade ≤ 2, and 65–74 ages; (Node 3) localized/regional stage, grade ≤ 2, and 75 ages and upper; (Node 4) localized/regional stage, 65 ages and upper and grade ≥ 3; (Node 5) distant stage. Thus, three prognostic factors were identified (Stage, age, and grade) for survival, resulting in five terminal nodes (Figure 4). Seventy-five percent of patients in Node 1 were alive and 72.3% of patients in Node 5 died. Decision tree constructed by recursive partitioning analysis from 6234 patients with malignant bone tumors. Proportions of patients with alive (blue) and dead patients (green) are represented at each node.
It was found that the survival time of the patients in Nodes 3, 2, and 1 was longer than the patients in Nodes 4 and 5 ( Kaplan-Meier curves for overall survival according to decision tree nodes.
Discussion
In this study, we applied DT analysis alongside traditional analysis methods in the survival problem of primary malignant bone tumors extracted from SEER records between 2000 and 2018. We used the SEER database in this study as it allows access to actual patient data with a large sample. The study is one of the largest series of studies on primary malignant bone tumors with 6234 patients. In the DT model, stage, grade, and age were variables that seem to explain the prognosis. This study solely focused on overall survival. Among the histological types, chondrosarcoma has the longest survival time. Although the survival times of osteosarcoma and Ewing sarcomas are shorter than chondrosarcoma, it was observed that there was no significant difference between the survival times. The other PMBTs had a median survival time of 50 (38.3–61.7) months, and this group had a shorter survival time than the other three common PMBTs.
Decision trees are visualization and prediction tools that belong to the family of supervised learning algorithms used to solve regression and classification problems. One advantage of DTs is that they can be implemented quickly and are easy to interpret. 13 The main purpose of our decision tree was to visualize the data and create a diagram that we can use for discussion. Stage, age, and grade were the most prognostic factors in both our DT model and the multivariate COX regression analysis. Also, the Multivariate Cox regression model is the most common tool for simultaneously investigating the influence of several factors on the survival time of patients. 14 Other factors associated with poor prognosis in the multivariate COX regression model are male gender, low income, pelvic and vertebral location, metastasis, and having two or more tumors. In this model, the high grade was associated with a 3-fold decrease in survival, while being 75 years or older was associated with an 8-fold reduction in survival. Distant or metastatic tumors were also associated with a 2.5-fold reduction in survival. Wang et al. 15 determined the stage, age, and grade as factors that are most associated with survival in the sample of patients with primary bone sarcoma in the hands or feet from the SEER database. In the study by Fukushima et al., 16 age and histological grade were found to be the two most prognostic markers for bone sarcomas. It has been shown that those aged 65 and over had 4.4 times more mortality and those with high histological grade had 3.8 times more mortality. Histological cancer grade is one of the most important prognostic markers, an indicator of the differentiation of tumor cells used to express the grade of cancer progression.17,18
In our study, the 5-years survival rate was 63.6% and the 10-years survival rate was 55.3% in the PMBT patients. There has been no significant improvement in the 5-years survival of primary bone cancer over the past 25 years.2,19 In the United States, the National Cancer Institute reported an overall 5-years survival of 66%.2,20 In the univariate survival analysis, older adults (>65 years), males, patients with tumor sites at the vertebral column, pelvic ring and sacrum, osteosarcoma and Ewing sarcoma, patients with low income, location in non-metropolitan areas, high grade, distant stage, metastasis and two and over malignant tumors had a lower 5-years survival rate. Similarly, Hu et al. 21 found a lower 5-years survival rate in men, the elderly older adults (>65 years), patients with tumor sites at the vertebral column, pelvic ring and sacrum, osteosarcoma, and Ewing sarcoma, low income, location in non-metropolitan areas.
In our study, similar to the literature, Osteosarcoma was the most frequently observed PMBT in the <15 years and 15–44 years age groups.16,18,22 With the introduction of adjuvant chemotherapy, 10-years survival has increased from 30% to approximately 50%, but there has been no improvement in 10-years survival since the 1990s.22,23 The 10-years survival rate for osteosarcoma in this study was 52.1%.
One of the main limitations of our study is that SEER records do not contain detailed information and variations in specific treatment strategies. SEER database does not include variables such as pathological fracture and surgical margin status, which are known as potential prognostic factors.
Conclusion
The decision tree analysis we utilized to evaluate mortality in PMBT showed five decision nodes among 11 prognostic factors by examining the stage, age, and grade variables concurrently. The implementation of decision trees may prove beneficial in the assessment of the prognosis of PMBT patients and could yield satisfactory outcomes. The extracted rules from decision trees offer insights into the risk factors in specific patient populations and can be utilized by clinicians in their decision-making process for individual patients. It is recommended to utilize decision trees in conjunction with Cox regression analysis to identify risk factors and their effects on survival.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical approval
No ethics committee approval nor patient approval forms are necessary to use data on the SEER platform. The SEER research contract was signed online to access the database. All patient identifiers and all personal data in this study are removed from the SEER database. As a result, studies using the SEER database are exempt from institutional review board approval.
Informed consent
No informed consent nor patient approval forms are necessary to use data on the SEER platform.
English language editing
This manuscript was edited by Gazi University Academic Writing, Application and Research Center. The manuscript was edited for organizational flow and proper English grammar, vocabulary, punctuation, and spelling. (Certificate Number: 05.01.2023/002).
