Identification of Prognostic Biomarkers for Breast Cancer Metastasis Using Penalized Additive Hazards Regression Model

Abstract

Background:

Breast cancer (BC) has been reported as one of the most common cancers diagnosed in females throughout the world. Survival rate of BC patients is affected by metastasis. So, exploring its underlying mechanisms and identifying related biomarkers to monitor BC relapse/recurrence using new statistical methods is essential. This study investigated the high-dimensional gene-expression profiles of BC patients using penalized additive hazards regression models.

Methods:

A publicly available dataset related to the time to metastasis in BC patients (GSE2034) was used. There was information of 22 283 genes expression profiles related to 286 BC patients. Penalized additive hazards regression models with different penalties, including LASSO, SCAD, SICA, MCP and Elastic net were used to identify metastasis related genes.

Results:

Five regression models with penalties were applied in the additive hazards model and jointly found 9 genes including SNU13, CLINT1, MAPK9, ABCC5, NKX3-1, NCOR2, COL2A1, and ZNF219. According the median of the prognostic index calculated using the regression coefficients of the penalized additive hazards model, the patients were labeled as high/low risk groups. A significant difference was detected in the survival curves of the identified groups. The selected genes were examined using validation data and were significantly associated with the hazard of metastasis.

Conclusion:

This study showed that MAPK9, NKX3-1, NCOR1, ABCC5, and CD44 are the potential recurrence and metastatic predictors in breast cancer and can be taken into account as candidates for further research in tumorigenesis, invasion, metastasis, and epithelial-mesenchymal transition of breast cancer.

Keywords

Gene expression profiling bioinformatics analysis breast neoplasm biomarker prognosis metastasis

Introduction

Breast cancer (BC) is considered as one of the most prevalent types of cancers diagnosed in women all over the world. BC has also been reported as the second leading cause of mortalities due to cancers in the United States in 2021.¹ The patients with BC are at risk of local recurrence as well as axillary lymph node metastasis despite early detection of the disease and successful surgical resection.^2-4 While the 5-year survival rate of BC patients at early stages is over 90%, this rate declines to <30% after the occurrence of metastasis.⁵

Early diagnosis of metastasis after surgery of BC patients is urgent for good prognosis,⁶ because advanced BC with distant metastasis is associated with serious outcomes for the patients. Therefore, creating diagnostic tools to identify individuals who are at recurrence/relapse risk of BC is of great importance, as these tools would help to assure the patient receives appropriate therapy.⁷ Currently, BC patients mainly receive comprehensive treatments (eg, surgery, chemotherapy and radiotherapy as well as hormone therapy, and targeted therapy) as therapeutic approaches.⁸ Exploring non-invasive diagnostic biomarkers helps clinicians to discriminate tumor types and cancer stages, and aids them in developing individualized/personalized therapy plans.⁹ In spite of the comprehensive exploration of etiology and signatures of BC, diagnostic tools for predicting the metastasis/recurrence of BC have not been developed in clinical practice.¹⁰ There is an urgent need for additional exploration of the underlying mechanisms of the BC metastasis and identification of biomarkers to screen BC recurrence.¹⁰

Nowadays, enormous amounts of genomic data are produced by continuous advances in biological research technologies including sequencing which have been widely used in creating diagnostic tools for different diseases. BC has a heterogeneous nature clinically and pathologically and has several prognostic subgroups due to the vast molecular changes. Genomic information provides a better understanding of the diversity observed in BC types and molecular complexity and helps to detect homogeneous subclasses of patients according to their therapeutic response. Molecular classifications should provide clinicians with better treatment options. Also, diagnosis and prognosis of the disease are expected to be improved by the identified biomarkers.^3,11

To date, gene-expression profiles-based classifiers have been developed for detecting BC tumor types.^12,13 However, few studies have considered identification of prognostic markers to predict metastasis. So far, several bioinformatics and machine learning methods have been utilized to model genome-wide data to handle high-dimensional and low sample size problems with this type of data. Especially, when there is survival outcome in clinical information of the patients along with molecular data, the penalized Cox proportional hazards regression model is the first option. In this regard, the primary goal is to select a small subset of genes associated with survival outcomes that can be used to construct predictive models. In the penalized regression model, a penalty term is attached to the likelihood function which shrinks the regression coefficients toward zero. Therefore, variable selection and coefficient estimation are done simultaneously.

Despite extensive research that has been conducted based on multiplicative Cox proportional hazards model for selecting gene expression profiles, few attempts have been made to consider additive risk regression models. Additive hazard models may provide information further than the typical proportional hazards in survival analysis. For example, the former estimates the hazards difference instead of hazard ratio indicating “the absolute difference in the instantaneous failure rate per unit of change in the exposure variable.” Moreover, additive hazards model is more appropriate in the case the proportional hazard assumption is violated.¹⁴ Penalized additive hazards model has been also developed for variables selection in the context of survival analysis and has been applied for selecting gene profiles related to various cancers.^15,16 Nevertheless, to our knowledge, there is no study that analyzes molecular data with survival outcome in BC patients to create prognostic models for this disease using additive risks model. The aim of the present study was to focus on utilization of penalized additive hazards regression model for analyzing high-dimensional time-to-metastasis data, to create a diagnostic model to predict survival time in BC patients, and to determine genes associated with time-to-metastasis.

Methods

Data source

In the present study, a dataset with an accession series entry of GSE2034 on NCBI/Genbank Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo) was used. Samples (n = 436) were related to patients with lymph-nodenegative breast cancer from tumor bank at the Erasmus Medical Center (Netherlands) and steroid-hormone receptors were measured. Time to relapse, as well as gene expression measurements for 22 283 genes (platform: Affymetrix Human Genome U133A Array) from 286 samples available from GEO were used in the present study.

An external dataset was used to validate the results (GSE26971). Information of 277 samples including gene expression profiles and time to relapse (platform: Affymetrix Human Genome U133A Array).

Statistical analysis

Gene selection through penalized additive hazards model

Penalized regression techniques are useful statistical learning methods for variable selection, especially when the sample size is smaller than the number of explanatory variables. In this approach a penalty is attached to the objective function; so that the estimates of the regression coefficients are shrinking toward zero. In this way, variables selection and coefficient estimation are done at the same time. In the present study, an additive hazards approach¹⁷ was considered as follows:

h (t; X_{1}, \dots, X_{p}) = h_{0} (t) + γ_{1} X_{1} + \dots + γ_{p} X_{p}

where, $X_{1}, \dots, X_{p}$ are input variables (here, genes), $γ = (γ_{1}, \dots, γ_{p})$ is the vector of regression coefficients, $h_{0} (.) a$ is an unspecified baseline hazard. The penalized estimator of regression coefficients is obtained by solving the following regularized problem:

\hat{γ} = \underset{γ}{a r g m i n} {\frac{1}{2} γ^{T} V γ - b^{T} γ + \sum_{j = 1}^{p} p_{λ} (| γ_{j} |)}

where, $V = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{τ} Y_{i} (t) {(X_{i} - \bar{X}) {(X_{i} - \bar{X})}^{T}} d t$ , $b = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{τ} {(X_{i} - \bar{X})} d N_{i} (t)$ , and $Y_{i} (t)$ and $N_{i} (t)$ are the at-risk indicator and the observed-failure counting process, respectively.¹⁷ Also, $p_{λ} (.)$ indicates a penalty like LASSO with the tuning parameter of $λ$ determined by cross-validation. Here, we used a 10-fold cross-validation to determine the optimum value of $λ$ . Several penalties were considered here including: (1) Least Absolute Shrinkage and Selection Operator (LASSO); (2) Smoothly Clipped Absolute Deviation (SCAD); (3) Elastic Net (ENet); (4) Smooth Integration of Counting and Absolute Deviation (SICA); and (5) the minimax concave penalty (MCP).¹⁷

Robustness assessment

The data partitioning was repeated 100 times to assess the robustness of the model. The process of feature selection was repeated 100 times for all 5 penalties and the frequency of selected features over 100 times was calculated. So, the gene expression profiles with the frequency greater than 50% were considered as selected.

Inferring survival groups

The prognostic index ( $γ^{T} . Z$ ) was calculated based on features selected using penalized additive hazards regression and its median was used as the cutoff point. Then, the log-rank test was used to evaluate the identified survival groups. The labels of the samples were determined using a prognostic index.

Software

In this study, R software programing (http://www.r-project.org) was used based on an R package provided by Lin and Lv (http://www-scf.usc.edu/~linwei/software.html).¹⁷

Results

There was information on 22 283 gene expression profiles for 286 samples. The data were preprocessed and quartile normalization was used to normalize data. Then, the penalized additive hazards model with 5 different penalties described in material and method section was applied to select genes related to the hazards of the patients. The response variable was survival time of the patients. Table 1a illustrates the name of 39 gene profiles selected by the Lasso, SCAD, ENet, SICA and MCP. The Lasso, ENet, SCAD, SICA and MCP selected 36, 13, 24, 39, and 10 gene profiles related to the survival of the patients. There were 9 genes common to the 5 different regression methods: SNU13, CLINT1, MAPK9, ABCC5, NKX3-1, NCOR2, COL2A1, and ZNF219. To investigate the association between the selected genes and the hazard of metastasis progression of the patients, unpenalized additive hazards regression was fitted. Regression coefficient of each identified gene along with unadjusted (univariate regression) and adjusted (multivariate regression) P-values were also provided in Table 1b. According to the results, all 39 genes had statistically significant associations with hazard function (unpenalized univariate additive hazards regression model). According to Table 1, some of the genes including SNU13, FBXO7, DUSP4, ABCC5, CD44, BTN3A2, CALML3, LILRA1, ZNF257, KCNJ4, SEC24A, CKAP2, ZBTB10, WFDC1, VTCN1, FRAS1, NOL3, and RPL31 reduced the hazard of metastasis, as their regression coefficients were negative. The rest of the genes were positively associated, therefore increasing the hazard of metastasis in patients with BC.

Table 1.

(a) The results of penalized additive hazards model and (b) regression coefficients from unpenalized additive hazards model for the selected genes from penalized additive hazards model on the original dataset.

Gene symbol	(a)					(b)
Gene symbol	LASSO	Enet	SCAD	SICA	MCP	Beta	S.E	Unadjusted P-value*	Adjusted P-value**
SNU13	*	*	*	*	*	−0.001	0.001	5.21E-05	0.493
FBXO7	*			*		−0.001	0.001	0.000229	0.064
ZFP36L2	*		*	*		0.000	0.001	0.000427	0.472
SMC4	*		*	*		0.000	0.001	4.46E-05	0.899
CLINT1	*	*	*	*	*	0.001	0.001	2.91E-06	0.495
DYRK2	*		*	*		0.000	0.001	8.41E-05	0.636
MAPK9	*	*	*	*	*	0.000	0.001	0.00082	0.944
DUSP4	*		*	*		−0.001	0.001	0.000266	0.077
KDELR3	*			*		0.000	0.001	7.29E-05	0.541
NEK2	*		*	*		0.001	0.001	1.50E-05	0.485
TCL1B	*			*		0.001	0.001	0.00061	0.032
None	*	*	*	*		0.000	0.001	4.23E-05	0.503
ABCC5	*	*	*	*	*	−0.003	0.001	7.76E-05	0.009
NKX3-1	*	*	*	*	*	0.000	0.001	6.12E-05	0.964
CD44	*	*	*	*	*	−0.001	0.001	2.68E-05	0.393
BTN3A2	*			*		−0.001	0.001	0.000803	0.070
CALML3				*		−0.001	0.001	0.000772	0.099
LILRA1	*		*	*		−0.001	0.001	0.000736	0.264
ZNF257	*			*		−0.001	0.001	8.85E-05	0.439
ALDH3B1	*			*		0.000	0.001	0.000128	0.449
KCNJ4	*			*		−0.001	0.001	0.000823	0.028
RB1	*		*	*		0.000	0.001	1.47E-05	0.783
TTI1	*	*	*	*		0.000	0.001	6.31E-05	0.659
SEC24A	*	*	*	*		−0.001	0.001	1.47E-05	0.446
SHC1	*			*		0.000	0.001	0.000114	0.657
NCOR2	*	*	*	*	*	0.000	0.001	0.000114	0.873
TCF7L2	*		*	*		0.001	0.001	0.000367	0.043
ARF1P1	*		*	*		0.001	0.001	0.000123	0.309
COL2A1	*	*	*	*	*	0.001	0.001	4.70E-05	0.041
CKAP2	*			*		−0.001	0.001	0.000375	0.043
WDR70	*			*		0.001	0.001	0.000162	0.225
ZBTB10	*			*		−0.001	0.001	0.000221	0.305
ZNF219	*	*	*	*	*	0.002	0.001	1.75E-05	0.001
WFDC1	*		*	*		−0.002	0.001	0.000558	0.004
VTCN1	*			*		−0.001	0.001	0.000738	0.305
FRAS1	*		*	*		−0.001	0.001	5.95E-05	0.264
MZB1	*			*		0.001	0.001	0.000908	0.214
NOL3				*		−0.001	0.001	0.000852	0.424
RPL31		*	*	*		−0.002	0.001	6.01E-06	0.138
Log rank test	P = <2e-16	P = <2e-16	P = <2e-16	P = <2e-16	P = 3e-16	univariate analysis*multivariate analysis

Also, unpenalized multivariate additive hazards regression model was fitted by considering all 9 common genes in the model, simultaneously. Table 2 shows the results. Seven genes, named WFDC1, CKAP2, COL2A1, TCF7L2, KCNJ4, ABCC5, TCL1B were statistically significant in the multivariate model. P-values obtained by multivariate unpenalized additive hazards regression models for sets of genes selected by each penalty were provides in Table 2.

Table 2.

Results of fitting multivariate additive hazards model using 9 in common genes on the training dataset.

Gene symbol	Coefficient	S.E	Lower.95	Upper.95	Z	Adjusted P-value
SNU13	−0.0019	0.000577	−0.00303	−0.00077	−3.28841	0.001008
CLINT1	0.001047	0.000501	6.42E-05	0.00203	2.088128	0.036786
MAPK9	0.000729	0.000651	−0.00055	0.002005	1.119757	0.262817
ABCC5	−0.00269	0.000906	−0.00447	−0.00092	−2.97405	0.002939
NKX3-1	−0.0011	0.000573	−0.00222	2.53E-05	−1.91581	0.055389
CD44	−0.00203	0.000618	−0.00324	−0.00082	−3.29091	0.000999
NCOR2	−0.00119	0.000528	−0.00222	−0.00015	−2.24803	0.024574
COL2A1	0.001741	0.000566	0.000632	0.00285	3.076194	0.002097
ZNF219	0.002475	0.000561	0.001375	0.003575	4.409521	1.04E-05
Log-rank test: P = 3e-16

A breast cancer survival prognostic index

A prognostic index was created based on 39 genes and the regression coefficients obtained from penalized additive hazards model. The patients were labeled as high/low risk groups according to the median of the prognostic index. The median survival time of the low/high risk groups was 161.85 (S.E = 2.88) and 39 (6.05) months, respectively. The log rank test indicated a statistically significant difference between the survival curves of the 2 identified groups (P < .001). The prognosis index was also calculated based on gene subsets identified by each method. The last row of Table 1, presents the results of log rank test for the survival risk groups based on genes selected by each of the 5 methods. According to the results all subsets could identify risk groups, significantly. Figure 1 depicts the Kaplan-Meier survival function clearly distinguishing the high/low risk groups.

Figure 1.

Kaplan-Meier survival curves for 2 identified risk groups based on 39 selected genes in original data.

External validation

We fitted univariate additive hazards regression model using the validation dataset described above. All 39 genes from penalized additive models in the original dataset were validated in the external dataset, and had P-values < .001. So, all selected genes were significantly associated with the hazard of metastasis. Also, risk groups were created based on these genes by calculating prognosis index for this data set. Figure 2 shows the Kaplan-Meier survival curves for these 2 groups. Also, log rank test for comparing survival curves for the 2 risk groups in validation data set showed significant differences (P < .001).

Figure 2.

Survival curves for the 2 identified risk groups based on 39 selected genes in the validation data.

Unpenalized additive hazards model based on the 9 common genes

Table 2 shows the results of fitting unpenalized additive hazards model on the original data set by considering only 9 common genes selected by all 5 penalties of LASSO, ENet, SCAD, SICA, and MCP. As seen, almost all genes (but one) were significantly associated with the hazards of metastasis (Adjusted P-value < .05). Table 3 shows the adjusted p-values related to fitting the unpenalized multivariate additive hazards model on the validation data set. As seen, NCOR2 and ZNF219 were significantly associated with the hazards of metastasis on the validation data set.

Table 3.

Results of fitting multivariate additive hazards model using 9 in common genes on the validation dataset.

Gene symbol	Coefficient	S.E	Lower.95	Upper.95	Z	Adjusted P-value
SNU13	1.23E-06	1.80E-06	−2.31E-06	4.76E-06	0.679566	.496779
CLINT1	1.94E-06	1.38E-06	−7.71E-07	4.64E-06	1.40187	.160954
MAPK9	−2.22E-06	1.80E-06	−5.75E-06	1.30E-06	−1.23634	.216332
ABCC5	−3.57E-07	6.94E-07	−1.72E-06	1.00E-06	−0.51425	.607074
NKX3-1	−8.01E-07	3.80E-07	−1.54E-06	−5.73E-08	−2.1109	.034781
CD44	−7.26E-08	9.94E-07	−2.02E-06	1.88E-06	−0.07299	.941811
NCOR2	−3.89E-06	1.56E-06	−6.94E-06	−8.35E-07	−2.49605	.012558
COL2A1	4.64E-06	2.85E-06	−9.35E-07	1.02E-05	1.631268	.102834
ZNF219	1.22E-05	4.35E-06	3.66E-06	2.07E-05	2.801443	.005087
Log-rank test: P = 3e-7

Initial screening of selected genes and investigating their relationship with breast cancer

In this study, the DisGeNET (v7.0) was used to initial screening of selected genes by an additive approach in relationship with breast cancer. DisGeNET is a discovery platform to integrate data from GWAS catalogs, animal models and the scientific literature.¹⁸ The MAPK9, NKX3-1, NCOR2, ABCC5, and CD44 were found as breast cancer-related genes among 9 common genes were selected by penalized methods.

Discussion

Breast cancer is considered as the first cancer among females throughout the world. Distant tumor recurrence (metastasis) can involve the whole body of patients and it is associated with many cancer related deaths. As little is known about biomarkers related to metastasis, utilizing advanced statistical models addressing high-dimensionality in gene expression studies is crucial for advancing our understanding of the metastasis mechanisms.^19,20 In this study, unlike other studies, an additive hazards regression approach with 5 different penalty functions was utilized to analyze a high dimension dataset consisting of gene expression profiles of breast cancer patients and a survival outcome (GSE2034). We used all 5 types of penalties to take advantage of all the methods, as each has different properties that can reveal new biomarkers. The identified genes were used to create a prognostic index and identify high risk and low risk patients. By using DisGeNET, 5 out of 9 genes (including MAPK9, NKX3-1, NCOR1, ABCC5, and CD44) were identified as the most important genes playing an important role in BC metastasis and correlated with distant tumor recurrence/metastasis in breast cancer patients.

Based on the additive hazards regression model, the selected genes can predict metastasis. The expression of MAPK9 can increase the hazard of metastasis incidence or can decrease the metastasis-free survival time, while, NKX3-1, NCOR2, ABCC5, and CD44 can increase the metastasis-free survival time.

The protein encoded by MAPK9 has an essential role in the control of several biological processes such as transcription regulation, proliferation, and apoptosis. This gene is involved in response to several stimuli. For example, MAPK9 is involved in the intrinsic pathway of apoptosis induction under UV radiation exposure.^21,22 Song et al²³ showed that the expression level of MAPK9 was significantly increased in HR + HER2 – tumor compared to adjacent normal tissue. Zekri et al showed that MAPK9 is involved in response to tamoxifen in HR + metastatic BC patients. The transcription factor NKX3.1, a member of Homeodomain family, can modulate the androgen receptor. NKX3.1 plays an important role in the regulation of cellular processes such as survival, differentiation, and proliferation.^24,25 Holmes et al²¹ showed that Nkx3-1 inhibits the binding of Estrogen (ER) to DNA and therefore regulating ER response in breast cancer. Nuclear receptor co-repressor 2 protein encoded with NCOR2 as a nuclear receptor co-repressor has a role in target genes transcription repressing through the alteration of epigenetic modification.²⁶ Moreover, the expression level of NCOR2 was associated with poor prognosis, chemoresistance, distant metastasis, and tumor recurrence of breast cancer.^27,28 ABCC5 as a member of ATP-binding cassette (ABC) transporter family is involved in efflux of several endogenous substances, toxins, and therapeutic agents, such as cisplatin methotrexate, and 5-Fu. On the other hand, ABCC5 has a role in chemoresistance of several cancers, such as hepatocellular carcinoma, breast cancer, and colorectal cancer.^29-31 CD44 as a transmembrane glycoprotein has a role in cellular communication considered as a cancer stem cell marker. Cancer stemness was associated with metastasis, epithelial to mesenchymal transition, and therapeutic resistance of different cancers, such as breast cancer.^32-34

Various variable selection methods including penalized methods have data-dependent performance, and there is no gold standard method for users.³⁵ The present study focused on utilization of penalized additive hazards regression with various penalties of LASSO, SCAD, ENet, SICA, and MCP to identify metastasis-related gene expression profiles. The additive hazards regression model is a beneficial alternative to Cox’s model for selecting relevant genes related to survival of the patients, in the presence of high-dimensional covariates. The selected genes may provide extra information when compared to multiplicative models, such as Cox regression. Lin and Lv¹⁷ conducted several simulation studies to investigate the accuracy of the penalized additive hazards models in terms of variable selection in the case of high dimension survival data (the number of covariates are much larger than the sample size). According to the results of their simulation studies, all the penalties showed sensitivities (ie, selecting the informative variables or providing a non-zero regression coefficient when it is actual coefficient is non-zero) over 0.9. Also, they showed that the SCAD and MCP had greater specificities (ie, not selecting the non-informative variables or providing a zero regression coefficient when it is actual coefficient is zero) compared to the LASSO and ENet, which was due to form of the penalty terms (the penalty function of MCP and SCAD is non-concave). They also applied the method to select genes related to the survival of diffuse large-B-cell lymphoma (DLBCL) data. Some remarkable features of using penalized additive hazards model are that it can be applicable when the number of covariates is large. It also provides the risk difference/excess risk which is more relevant in clinical studies compared to the risk ration. Also, this method is computationally efficient (which is very important in high-dimension setting), as it uses a least square method. Nevertheless, due to the additive form of the hazard function, negative values might be produced for the some individuals and the obtained least squares estimators may not have the support of likelihood theory.³⁶ Future studies should have focus on the properties of the estimators based on the maximum likelihood function.

The present study also introduced a new set of influential gene expression profiles in predicting metastasis in breast cancer patients, using an additive hazards model, with a different perspective than the proportional hazards point of view.

Conclusion

Considering the findings of this study, the penalized additive hazards model could identify high/low risk survival subgroups of BC patients. MAPK9, NKX3-1, NCOR1, ABCC5, and CD44 could be used as potential recurrence and metastatic prediction biomarkers in breast cancer. The identified a small subset of genes associated with survival of breast cancer patients additively that revealed additive risks. They provide different information than multiplicative models. It is recommended to use additive and multiplicative models to take advantages of both models to take insight into the disease. However, further molecular studies should be performed to validate the role of these genes in tumorigenesis, invasion, metastasis, and epithelial-mesenchymal transition of breast cancer.

Footnotes

Acknowledgements

This work was part of an MSc thesis in Biostatistics. We would like to appreciate the Research and Technology Deputy of the Hamadan University of Medical Sciences and the Research and Technology Deputy of the Hamedan University of Technology for technical support for their approval and support of this work.

Funding:

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially supported by Hamadan University of Medical Sciences (Grant NO: 1401120210346).

Declaration of Conflicting Interests:

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Authors’ Contributions

LT and OH conceived the research topic. LT, OH, PA, SS, ID and SA explored that idea, performed the statistical analysis, and drafted the manuscript. All authors participated in the interpretations and drafting of the manuscript. All authors read and approved the final manuscript.

Availability of Data and Materials

The datasets is publically available on . All analyzed during the current study are available from the corresponding author on request.

Ethics Approval and Consent to Participate

This study used a publically available data set. All methods were carried out in accordance with relevant guidelines and regulations, and the study was approved by the Ethics Committee of the Hamadan University of Medical Sciences (IR.UMSHA.REC.1401.922). The funding body had no role in the design of the study and collection as well as in writing the manuscript.

Consent for Publication

Not applicable.

References

Peng

Lin

Jing

, et al. A novel seven gene signature-based prognostic model to predict distant metastasis of lymph node-negative triple-negative breast cancer. Front Oncol. 2021;11:746763.

Chen

Hoffmann

Liu

Organotropism: new insights into molecular mechanisms of breast cancer metastasis. NPJ Precis Oncol. 2018;2:4-12.

Lim

Lee

Choi

, et al. A novel prognostic nomogram for predicting risks of distant failure in patients with invasive breast cancer following postoperative adjuvant radiotherapy. Cancer Res Treat Official J Korean Cancer Assoc. 2017;50:1140-1148.

Zhu

, et al. Breast cancer subtypes predict the preferential site of distant metastases: a SEER based study. Oncotarget. 2017;8:27990-27996.

Siegel

Naishadham

Jemal

Cancer statistics, 2012. CA Cancer J Clin. 2012;62:10-29.

Takada

Kashiwagi

Asano

, et al. Prediction of distant metastatic recurrence by tumor-infiltrating lymphocytes in hormone receptor-positive breast cancer. BMC Womens Health. 2021;21:225-311.

Wang

Klijn

Zhang

, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365:671-679.

Peart

Breast intervention and breast cancer treatment options. Radiol Technol. 2015;86:535M-558M.

Liu

Zhang

Huang

, et al. Plasma HSP90AA1 predicts the risk of breast cancer onset and distant metastasis. Front Cell Dev Biol. 2021;9:639596.

10.

Cai

Mei

Xiao

, et al. Identification of five hub genes as monitoring biomarkers for breast cancer metastasis in silico. Hereditas. 2019;156:12.

11.

Sabatier

Gonçalves

Bertucci

Personalized medicine: present and future of breast cancer management. Crit Rev Oncol Hematol. 2014;91:223-233.

12.

Ntzani

Ioannidis

JP.

Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet. 2003;362:1439-1444.

13.

Wang

Jatkoe

Zhang

, et al. Gene expression profiles and molecular markers to predict recurrence of Dukes' B colon cancer. J Clin Oncol. 2004;22:1564-1571.

14.

Xie

Strickler

Xue

Additive hazard regression models: an application to the natural history of human papillomavirus. Comput Math Methods Med. 2013;2013:1-7.

15.

Hamidi

Tapak

Jafarzadeh Kohneloo

Sadeghifar

High-dimensional additive hazards regression for oral squamous cell carcinoma using microarray data: a comparative study. Biomed Res Int. 2014;2014:393280.

16.

Tapak

Mahjub

Sadeghifar

Saidijam

Poorolajal

Predicting the survival time for bladder cancer using an additive hazards model in microarray data. Iran J Public Health. 2016;45:239-248.

17.

Lin

D Y

Ying

. Semiparametric analysis of the additive risk model. Biometrika. 1994; 81: 61–71.

18.

Lin

High-dimensional sparse additive hazards regression. J Am Stat Assoc. 2013;108:247-264.

19.

Piñero

Ramírez-Anguita

Saüch-Pitarch

, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48:D845-D855.

20.

Weigelt

Peterse

van 't Veer

LJ.

Breast cancer metastasis: markers and models. Nat Rev Cancer. 2005;5:591-602.

21.

McGuire

Brown

Kerin

MJ.

Metastatic breast cancer: the potential of miRNA for diagnosis and treatment monitoring. Cancer Metastasis Rev. 2015;34:145-155.

22.

Yue

López

. Understanding MAPK Signaling Pathways in Apoptosis. Int J Mol Sci. 2020;21(7):2346..

23.

Song

Dou

Zhang

, et al. Public transcriptome database-based selection and validation of reliable reference genes for breast cancer research. Biomed Eng Online. 2021;20(1):124.

24.

Lei

Jiao

Xin

, et al. NKX3.1 stabilizes p53, inhibits AKT activation, and blocks prostate cancer initiation caused by PTEN loss. Cancer Cell. 2006;9:367-378.

25.

Gurel

Ali

Montgomery

, et al. NKX3.1 as a marker of prostatic origin in metastatic tumors. Am J Surg Pathol. 2010;34:1097-1105.

26.

Mori

Verma

Nakamoto-Matsubara

, et al. Low NCOR2 levels in multiple myeloma patients drive multidrug resistance via MYC upregulation. Blood Cancer J. 2021;11:194.

27.

Green

Burney

Granger

, et al. Prognostic significance of steroid receptor co-regulators in breast cancer: co-repressor NCOR2/SMRT is an independent indicator of poor outcome. Breast Cancer Res. 2008;10:63.

28.

Tsai

Huang

Northey

, et al. Screening of organoids derived from patients with breast cancer implicates the repressor NCOR2 in cytotoxic stress response and antitumor immunity. Nature Cancer. 2022;3:734-752.

29.

Chen

Wang

Gao

, et al. Human drug efflux transporter ABCC5 confers acquired resistance to pemetrexed in breast cancer. Cancer Cell Int. 2021;21:136.

30.

Huang

Chen

, et al. ABCC5 facilitates the acquired resistance of sorafenib through the inhibition of SLC7A11-induced ferroptosis in hepatocellular carcinoma. Neoplasia. 2021;23:1227-1239.

31.

Chen

Villeneuve

Jonker

, et al. ABCC5 and ABCG1 polymorphisms predict irinotecan-induced severe toxicity in metastatic colorectal cancer patients. Pharmacogenet Genomics. 2015;25:573-583.

32.

Hassn Mesrati

Syafruddin

Mohtar

Syahir

CD44: A multifunctional mediator of cancer progression. Biomolecules. 2021;11:1850.

33.

Biddle

Mackenzie

IC.

Cancer stem cells and EMT in carcinoma. Cancer Metastasis Rev. Published online Febuary 3, 2012. doi:10.1007/s10555-012-9345-0

34.

Al-Othman

Alhendi

Ihbaisha

Barahmeh

Alqaraleh

Al-Momany

BZ.

Role of CD44 in breast cancer. Breast Dis. 2020;39:1-13.

35.

Kosorok

Fine

JP.

Additive risk models for survival data with high-dimensional covariates. Biometrics. 2006;62:202-210.

36.

Goeman

Putter

Maximum likelihood estimation in the additive hazards model. arXiv preprint arXiv:2004.06156 (2020).