Abstract
Background
The prognostic factors of survival can be accurately identified using data from different health centers, but the structure of multi-center data is heterogeneous due to the treatment of patients in different centers or similar reasons. In survival analysis, the shared frailty model is a common way to analyze multi-center data that assumes all covariates have homogenous effects. We used a censored quantile regression model for clustered survival data to study the impact of prognostic factors on survival time.
Methods
This multi-center historical cohort study included 1785 participants with breast cancer from four different medical centers. A censored quantile regression model with a gamma distribution for the frailty term was used, and p-value less than 0.05 considered significant.
Results
The 10th and 50th percentiles (95% confidence interval) of survival time were 26.22 (23–28.77) and 235.07 (130–236.55) months, respectively. The effect of metastasis on the 10th and 50th percentiles of survival time was 20.67 and 69.73 months, respectively (all p-value < 0.05). In the examination of the tumor grade, the effect of grades 2 and 3 tumors compare with the grade 1 tumor on the 50th percentile of survival time were 22.84 and 35.89 months, respectively (all p-value < 0.05). The frailty variance was significant, which confirmed that, there was significant variability between the centers.
Conclusions
This study confirmed the usefulness of a censored quantile regression model for cluster data in studying the impact of prognostic factors on survival time and the control effect of heterogeneity due to the treatment of patients in different centers.
Introduction
Breast cancer, with approximately 685,000 deaths in 2020, is one of the most common cancers among women worldwide. 1 Each year, more than 1.5 million women (25% of all women with cancer) are diagnosed with breast cancer worldwide, and 39% of the confirmed cases occur in Asian countries.2–4 Early diagnosis of the disease can lead to a good prognosis and a higher survival rate. Due to the early detection of this disease, the 5-year relative survival rate of breast cancer in high-income countries is greater than 80%, while in low-income countries, the 5-year relative survival rate of breast cancer is approximately 40% due to low-level consciousness and late diagnosis.5,6 Several studies have proposed various prognostic factors in the breast cancer. Age at diagnosis, disease stage, and the number of involved lymph nodes, tumor size and grade, type of auxiliary treatment, metastasis and recurrence are among these factors.7,8 Various studies have been performed in Iran to determine the factors affecting survival in patients with breast cancer and estimate survival time.9,10 The prognostic factors of survival can be accurately identified using data from different health centers, but the structure of multi-center data is heterogeneous due to the treatment of patients in different centers and under the care of different physicians or similar reasons. The variability of survival data is split into a part that depends on risk factors, and is therefore theoretically predictable, and a part that is initially unpredictable, even when all relevant information is known. A separation of these two sources of variability has the advantage that heterogeneity can explain some unexpected results or give an alternative interpretation of some results. When multivariate survival times are considered, there one aims to account for the dependence in clustered event times, for example in the lifetimes of patients in study centers in a multi-center clinical trial, caused by center-specific conditions. A natural way to model dependence of clustered event times is through the introduction of a cluster-specific random effect—the frailty. This random effect explains the dependence in the sense that had we known the frailty, the events would be independent. Ignoring this heterogeneity may lead to misidentification of the prognostic factors and their effects. To analyze multi-center data in survival analysis, a common way to model clustered data is to use the shared frailty model. Frailty models are extensions of the proportional hazards model, best known as the Cox model. The shared frailty model assumes that individuals from different centers are independent, but individuals from the same center are correlated and assumes a shared frailty term for individuals in the same cluster. 11
The Cox model has proportional hazard assumption and also assumes all covariates have homogenous effects. Another limitation is the complexity of interpreting the hazard ratio, which leads to misinterpretation of the covariate effect. 12 However, the accelerated failure time (AFT) models give us a direct interpretation of the covariate effect but require a homogeneous hypothesis of the covariate effect. Therefore, a need for method that does not require the assumptions of classical survival methods is required. The censored quantile regression (CQR) model does not need to assume homogeneous covariate effects and can directly explain the impact of the covariates on survival time. 13 This model was introduced by Koenker and Bassett. 14 and was developed by an extension of the ordinary regression method that models the conditional mean of the response variable as a conditional quantile function. Because the CQR model provides a more dynamic relationship between covariates and survival time and is easier to interpret than the classic survival model, this method can be considered useful for modeling time-to-event data. 12 The CQR model was extended to the CQR model with the frailty term to model the quantile of survival time conditional on set of covariates when data are clustered such as in multi-center studies. 15
In this study, we used patient data from different medical centers. To determine the prognostic factors that affect the quantile survival time of breast cancer, we used the CQR model for clustered survival data to control the center effects.
Materials and Methods
In this historical cohort study, the dataset was collected as ancillary data containing information on 1785 breast cancer patients (800, 393, 413, and 179 patients from four health centers) based on pathological diagnosis of breast cancer without missing any, referring to four breast cancer medical centers in Tehran, Iran, who that completed the follow-up period between 1997 and 2013. The median follow-up for centers C and D was 65.70 and 51 months, respectively, and for centers A and B it was 22.32 and 29 months, respectively. The median follow-up time for the total data was 29.71 months, and the interquartile range was 19–61 months. Survival time was defined as the duration (months) from diagnosis to death from breast cancer. The event was death from breast cancer, and all other deaths were regarded as censored observations. Diagnosis age (years), tumor characteristics included: tumor size (>2 cm, between 2 and 5 cm, >5 cm), number of involved lymph nodes (no lymph nodes, 1–3 lymph nodes, 4–9 lymph nodes, and >9 lymph nodes), grade of malignancy (grades 1–3) and type of surgery (modified radical mastectomy (MRM) or breast conservation surgery (BCS)), adjuvant chemotherapy and radiotherapy, recurrence, and metastasis are the prognosis factors considered in this study.
Statistical Analysis
The descriptive characteristics of the patients are shown as the mean (± standard deviation) for continuous and frequency (percentage) for categorical variables. We used the CQR model with frailty, to determine the prognostic factors affecting quantiles survival time of breast cancer.
Quantile Regression Model with Frailty
Let
Suppose there is a fixed k-dimensional parameter vector
We consider Laplace regression model with frailty term.
1
for quantiles, 0.10, 0.20, 0.30, 0.40, and 0.50 and we use the Bayesian method with a MCMC to inference about parameters of interest. We assume the non-informative prior distributions
The MCMC sampler is implemented by using WinBUGS software and R.’s package R2WinBUGS software, interactively. The MCMC procedure was applied to the breast data, after the initial number of 20,000 burn-in iterations of chain of length 30,000, every 100th MCMC sample was retained from the next 20,000 for chain. We used standard tools in WinBUGS software to evaluate the convergence of the generated samples, such as trace and autocorrelation plots. A p-value <0.05 was considered statistically significant.
Results
During the follow-up period, 337 patients (18.9%) died from breast cancer and 1448 patients (81.1%) survived. The clinical characteristics and treatment methods according to survival status are presented in Table 1. The mean (standard deviation) age at diagnosis was 48.78 (12.63). A total of 275 (81.60%) and 62 (18.40%) patients who died underwent MRM and BCS, respectively. Of the deceased patients, 145 (43.03%) and 41 (12.17%) patients experienced metastasis and recurrent, respectively. Radiotherapy and chemotherapy were performed in 41 (12.17%) and 226 (67.06%) of the dead patients, respectively. 68 (20.18%), 189 (56.08%), and 80 (23.74%) of the deceased patient were diagnosed with tumor size <2 cm, 2–5 cm, and >5 cm, respectively. 23 (6.82%), 180 (53.41%), and 134 (39.76%) of the deceased patient were diagnosed with tumor grades 1, 2 and 3, respectively. 65 (19.29%), 81 (24.04%), 96 (28.49%), and 95 (28.19%) of the deceased patient were diagnosed with Involved lymph node 0, 1–3, 4–9 and >9, respectively (Figure 1).

Frequency of Clinical Characteristic and Treatments by survival status, (MRM, modified radical mastectomy; BCS, breast-conserving surgery).
Profile of Patient Demographics and Clinical Characteristic.
N, number; %, percentage; SE, standard deviation; MRM, modified radical mastectomy; BCS, breast-conserving surgery.
Based on the Kaplan–Meier plot, at the end of the follow-up the minimum percentile of survival was 51%. Thus, we considered the10th, 20th, 40th, and 50th percentiles of survival time in CQR model. The 10th, 20th, 30th, 40th, and 50th percentiles (95% confidence interval) of survival time were 26.22 (23–28.77), 48.98 (42.84–58.32), 83 (67.87–93.27), 118.19 (100–158), and 235.07 (130–236.55) months, respectively. The results from the Laplace regression model with frailty are presented in Table 2. The effect of metastasis on the 10th, 20th, 30th, 40th, and 50th percentiles of survival time was 20.67, 39.22, 45.25, 58.10, and 69.73 months, respectively (all p-value < 0.05). An increase in the number of involved lymph nodes decreased the quantiles of survival time. Effect of 1–3, 4–9 and above 9 involved nodes compare with the patients with no involved nodes on the 20th percentile of survival time were 20.60, 23.91, and 31.94 months, respectively, on the 50th percentile of survival time are 31.06, 31.86, and 39.14 months, respectively (all p-value < 0.05). Effect of the BCS method compare with the MRM method on the 10th, 20th, 30th, 40th, and 50th percentiles of survival time were 9.84, 11.05, 13.55, 16.24, and 16.73 months, respectively (all p-value < 0.05). In the examination of the tumor grade, effect of grades 2 and 3 tumors compare with the grade 1 tumor on the 20th percentile of survival time were 23.05 and 31.27 months, respectively, and on the 50th percentile of survival time were 22.84 and 35.89 months, respectively (all p-value < 0.05). Frailty was significantly different from zero in the 20th, 30th, 40th, and 50th percentiles (all p-value < 0.05), which affirmed that there was significant variability between the centers in these percentiles.
Effect of Prognostic Factors on the Survival Time Quantiles Based on the Results of Laplace Regression with Frailty Model.
Note: Coef.: estimated parameter, SE: standard error, MRM, modified radical mastectomy; BCS, breast-conserving surgery.*p-Value < 0.05.
The plots of the estimated CQR coefficients and their 95% credible intervals (CI) for p ∈ (0.10, 0.20, …, 0.50) are displayed in Figure 2. The plots of the estimated CQR coefficients for the metastasis parameter show that the estimated coefficients increase when the estimated quantiles increase.

The effect of variables on 10th, 20th, 30th, 40th, and 50th percentiles of survival times based on Laplace regression model. Dashed lines represent 95% credible interval for estimated effect.
Discussion
In this multi-center study, we used a Laplace regression model with frailty to study the impact of prognostic factors on survival time. We used data collected from four health centers in Tehran, Iran. Based on the results of Laplace regression with frailty, the type of surgery, metastases, radiotherapy, number of involved nodes and tumor grade were prognostic factors for survival in all quantiles. The frailty term had a significant effect on the model, which shows the necessity of using the frailty model to control the changes between centers due to unconsidered covariates. In survival analysis, due to some characteristics such as shared genes, environmental background or data from a multi-center study, the data may be related or grouped. A common method to accommodate cluster data is the shared frailty model. 16 The shared frailty model assumes that individuals from different clusters are independent however, they have a shared frailty term for in the same cluster. 16
In survival studies researchers often use the Cox model to study prognostic factors survival time. The CQR model introduced by Koenker and Bassett 14 and was developed as an extension of the ordinary regression method that models the conditional mean of the response variable as a conditional quantile function. 17 The CQR model does not require the assumption of homogeneous covariate effects and can directly explain the impact of covariates on the time of the event. 18 The main advantage of the CQR model is that it can predict the distribution of time-to-event, whereas the Cox model does not include this main capability. 18 Clinicians and medical researchers can demonstrate the risk of events of interest over time using a multivariate CQR model that cannot be measured with the Kaplan–Meier method. 13 In fact, the CQR model can accommodate a quantile function of time-to-event which measures the quantile to demonstrate the level of the survival phase. 13
To our knowledge, only a limited number of studies have used CQR to investigate the prognostic factors for quantile survival in breast cancer. Faradmal et al used the CQR model to better understand the multivariate association between prognostic factors and survival time. 19 They showed that changes in age at diagnosis, number of involved lymph nodes and tumor size could significantly change the median and some other quantiles of survival time. Yazdani used the Cox frailty model with a gamma distribution for the frailty term and showed that type of surgery, number of lymph nodes involved, metastasis, radiotherapy and tumor grade were prognostic factors for survival in breast cancer. 10 In this multi-center study, age of diagnosis was significantly associated with the 10th and 40th percentile survival times. In our study, the survival time decreased when the number of axillary lymph nodes involved and the tumor grade increased. Previous studies on the effect of tumor characteristics on survival were based on cohorts of patients with breast cancer diagnosed in 2004 at the latest, and changes to more recent systemic therapy have not yet occurred.20–22 Our study showed that traditional prognostic factors, such as the number of positive lymph nodes, are still the main prognostic factors for survival in the current era of new systemic therapies. These results show that the improvement in survival can mainly be explained by the effect of both earlier diagnosis as a result of breast cancer screening and awareness of better treatment options. Surgery is critical for survival, and despite corrections for staging, age, and adjuvant therapy, breast-conserving therapy can provide a better survival rate than mastectomy. 10 In our study, the percentiles of survival time for patients with BCS were significantly higher than patients with MRM method.
One of the limitations of this study is the lack of measurement of some variables, such as family history, marital status, estrogen and progesterone receptors, and other factors that are involved in all health studies, therefore, we could not analyze them in our multi-center study. Based on this multi-center study, early diagnosis of cancer before the involvement of lymph nodes, the onset of metastasis, and timely treatment can lead to a longer life and increase the quality of life of patients.
Footnotes
Authors’ Contributions
AY, HZ and MY designed the model and the computational framework and analyzed the data. AY wrote the manuscript with support from HZ and MY. SH and AK contributed to sample preparation. All authors discussed the results and contributed to the final manuscript. All authors have read and approved the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Statement
This study was approved by the Ethics Committee of School of Public Health & Allied Medical Sciences- Tehran University of Medical Sciences. Approval ID: IR.TUMS.SPH.REC.1397.212.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
