Abstract
This narrative review explores the burgeoning field of Artificial Intelligence (AI)-driven Breast Cancer (BC) survival prediction, emphasizing the transformative impact on patient care. From machine learning to deep neural networks, diverse models demonstrate the potential to refine prognosis accuracy and tailor treatment strategies. The literature underscores the need for clinician integration and addresses challenges of model generalizability and ethical considerations. Crucially, AI’s promise extends to Low- and Middle-Income Countries (LMICs), presenting an opportunity to bridge healthcare disparities. Collaborative efforts in research, technology transfer, and education are essential to empower healthcare professionals in LMICs. As we navigate this frontier, AI emerges not only as a technological advancement but as a guiding light toward personalized, accessible BC care, marking a significant stride in the global fight against this formidable disease.
Introduction
Breast cancer (BC) is one of the major contributors to the global healthcare burden. Trends have shown that global BC incidence and mortality continue to rise every year. 1 Despite the recent advancements in early detection linked to better survival outcomes, BC still poses a great challenge to be curtailed. Thus, it is imperative to explore innovative solutions to improve BC care in the contemporary technologically advanced era.
Artificial intelligence (AI) refers to the generation of near-human intelligence processes by computers through repetitive machine learning. AI stands at the forefront of these innovations and has resulted in a paradigm shift in healthcare. It has become a cornerstone in modern cancer care, evolving significantly through machine learning processes. In BC, AI offers promising avenues from enhancing early detection to delivering efficient, personalized treatment strategies. In BC radiomics alone, AI-based image recognition models have empowered radiologists with increasing diagnostic accuracy. 2 Machine learning has enormous potential in deciphering large data about survival predictions. This will help clinicians maximize BC care by identifying key factors affecting the survival rates. Understanding the need of the hour, various AI models have come out to generate real-world algorithms. For instance, Kalafi et al. 3 in their analysis of 4902 patients with BC using machine learning processes concluded an improvement in the BC survivability predictions. According to these models, factors such as tumor size, stage, number of axillary lymph nodes removed, and the number of lymph nodes positive for malignancy were identified as the most impactful determinants of survivability. 3 It is essential for medical and surgical oncologists to not only acknowledge these innovative predictive tools but also gain knowledge about their utility and incorporate AI-generated outcome prediction early on in their clinical settings. A recent survey study done in Turkey aimed to gain insights into 165 medical oncologists’ perspectives on AI in cancer care found that almost half of the participants had no experience with AI technology, and about 44% had concerns about the reliability of the results. 4
The predictive potential of these tools is not only limited to early-stage BC. Researchers from South Korea showed remarkable results from their AI-based predictive tool in metastatic BC patients, especially in those with initial bone metastases.5,6 However, these tools lack generalizability and decision-making should include clinical judgment as well. Also, careful ethical consideration about maintaining patient privacy, data security, and the responsible use of AI algorithms needs to be addressed.
This review offers valuable insights into the transformative potential of AI-driven frameworks in BC survival prediction, having a protentional to reshape the landscape of future BC care. Updating medical and surgical oncologists with cutting-edge AI-based predictive tools in BC care, and their clinical utility in real-world settings, lays a foundation for a future where personalized and efficient BC care becomes more efficient and accessible.
Materials and Methods
In this narrative review, a comprehensive analysis of recent research studies was conducted to explore the application of AI in BC survival prediction. Relevant articles, published between 2021 and 2023, were selected based on systematic literature searches in academic databases (PubMed, Google scholar, Embase). Inclusion criteria encompassed studies that utilized AI techniques, such as machine learning, deep learning, and bioinformatics, for BC survival prediction, considering various subtypes, stages, and clinical settings. The keywords used to retrieve relevant articles included “Breast Cancer,” “Survival Prediction,” “Artificial Intelligence,” Machine Learning’, and “Deep Learning.” Title, abstract and full-text screening of the articles was performed to include original articles published within the specified time period that were based on survival or prognostic prediction models.
Once the screening of the articles was completed, the remaining articles were then reviewed for impertinent data. Information regarding the author(s), publication year, type of study, AI-based model, sample size, data sources, research objectives, results, and conclusions was extracted and synthesized. Limitations, as mentioned in the individual studies, were also discussed to provide a comprehensive overview of the scope and potential biases in the research. The review aims to provide valuable insights into the transformative potential of AI-driven frameworks in BC survival prediction and their implications for the future of BC care.
Discussion
Breast cancer is the most diagnosed cancer worldwide with 11.7% of the newly diagnosed cancers being BC according to GLOBOCAN 2020 estimates. It is the leading cause of female cancer deaths worldwide. 7 In 2022, 43 250 female deaths in United States were attributed to BC. 8 With increasing BC prevalence, survival and prognosis prediction can play a vital role in BC care with emphasis on tailored and patient-specific treatment strategies that are more effective in improving mortality. Traditionally, epidemiological analyses such as generalized linear models have been used to identify prognostic factors for different diseases. However, these methods have proven to be insufficient in the context of non-linear and complex interactions and a large set of predictors. 9 With the introduction of AI in healthcare, numerous machine learning and neural network models have been developed that have transformed the landscape of cancer prognosis. These new prognostic tools have vastly improved accuracy assisting the clinicians in making appropriate decisions with regards to patient care. 10 Table 1 explores the studies conducted with respect to BC survival prediction using AI models.
AI-based models for breast cancer prognosis: summary of studies and predictive outcomes.
Abbreviations: AUC, area under the curve; BCBM, brain metastatic breast cancer; BMBC, bone metastatic breast cancer; CDI, cell death index; CGAN, conditional generative adversarial network; CNN, convolutional neural network; DL, deep learning; ER, estrogen receptor; GSE, gene expression omnibus series; GSEA, gene set enrichment analysis; HER2, human epidermal growth factor receptor 2; HR+, hormone receptor positive; KM-plotter, Kaplan-Meier plotter; METABRIC, Molecular Taxonomy of Breast Cancer International Consortium; PR, progesterone receptor; SEER, surveillance, epidemiology, and end results; TME, tumor microenvironment; TNBC, triple-negative breast cancer; TRG, TME-related gene; WGCNA, weighted gene co-expression network analysis.
Nguygen et al. used several prognostic factors to develop prediction models incorporating machine learning. These factors were not limited to cancer characteristics but also included demographic information, comorbidities, drug history and certain laboratory markers. While machine learning models showed relatively high Area Under Curve (AUC) up to 0.83 (voting classifier model), the highest AUC (0.95) was observed in the Artificial Neural Network (ANN) model with cancer stage and tumor size being the most significant feature and 90% accuracy. 8 Several other deep learning methods have been used to create survival prediction algorithms incorporating clinicopathological and genetics data of patients.15 -19
Multimodal data for improved models
These models have also highlighted the importance of using multimodal data resulting in more accurate survival predictions as opposed to unimodal tools that only use clinical data or gene expression to generate survival prediction. Two such models were generated that integrated clinical data with gene expression and Copy Number Alteration (CNA) data to generate survival and prognosis prediction. Both were two-stage models that used Convolutional Neural Networks (CNN) to first extract features from imbalanced data. To handle the imbalanced data, while Han Yuan et al utilized oversampling, Arya et al used the stacked ensemble method. These features were then fed into a gated multimodal unit 19 and a random forest model, 17 respectively, where all the inputs were processed to generate final prediction results. Both models demonstrated superior predictive performance as compared to other unimodal and multimodal tools that adopted Deep Neural Network (DNN) instead of CNN that possesses the capacity to extract a more extensive array of hidden features.17,19 Yuan et al also compared 5-year survival predicted by Deep Multi-Modal Fusion Network (DMMFN) with stacked ensemble method proposed by Arya et al and yielded similar results (AUC = 0.964 versus 0.955). 17
Prognostic models for different tumor subtypes
With the evolution of AI in BC, its role in management of different cancer subtypes has also been investigated. This is notably important for Triple-Negative Breast Cancer (TNBC) which is the most invasive subtype and metastatic cancer cases have a 5-year survival in less than 30% cases. 11 Therefore, the contributions of AI in TNBC have been most pertinent in exploring novel management options that can impact the prognosis of patients. While surgery remains to be the mainstay of treatment, these tools can allow identification of personalized treatment strategies for patients based on prediction of their tumor microenvironment. Zou et al derived Cell Death Index (CDI) using 12 different programed cell death patterns and used it to establish multiple algorithms including a prognostic nomogram by consolidating them with clinical data using COX and regression analyses. Relationships with different immune therapies and immune microenvironments were evaluated that allowed them to predict drug sensitivities. 6 Another study used Weighted Gene Co-expression Network Analysis (WGCNA) to identify multiple genes by comparing TNBC to normal tissues and used techniques like LASSO to derive key genes and their association with overall and disease-free survival. 12 Gou et al also assessed immunotherapy response and their association with tumor microenvironment scores. This study identified 20 genes that influence immune cell infiltration and are associated with immunotherapy efficacy. 18
Deep learning methods have also been applied to other tumor subtypes, mostly Hormone Receptor Positive/ Human Epidermal growth factor Receptor-2 Negative (HR+/HER2−) patients, with emphasis on consolidating clinical features with multi-omics data. While the use of only clinical data may introduce a certain level of clinician bias into the model, the incorporation of multi-omics can minimize that. Being the most common tumor subtype, understanding its relationships with regards to multi-omics characteristics and pathological morphology can significantly influence the precision of management of BC. Hu et al generated whole slide images from histological slides of surgical patients to develop a neural network with the capacity to predict a variety of features including clinicopathological factors, gene mutations, biological pathways, immunotherapeutic markers, and relapse-free survival. High predictive accuracy was seen for histologic grade, and certain molecular markers. 16
Genomics and prognosis prediction
Another study also combined high dimensional gene expression with miRNA expression and clinical data to derive prognosis prediction via a deep neural network, AutoSURV. This model was able to extract latent features from the omics and clinical data provided predicting prognostic indices that were patient specific. Several genes, miRNA and pathways associated with high and low risk patients were also identified creating a potential for targeted therapies for these patients. 15 Zhang et al also proposed a Deep Bayesian Perturbation COX Network (DBP) that uses previous censored knowledge to improve estimation bias in other COX neural networks. This was applied to multiple genomic datasets to identify different genes and pathways. This allowed the incorporation of non-linear functions that is not possible with COX methods, so a larger set of censored data can also be utilized resulting in a more accurate predictive model. The study used generative adversarial network, a DNN that uses training data to generate new data that emulates the training set, to create a prognostic tool, PregGAN. It sets itself apart from a conventional DNN in that it employs adversarial training where a discriminator is used to check whether the data generated by the generator is correct. This model also used a gradient sampling strategy to eliminate the instability in other conditional Generative Adversarial Network (GAN) models in literature. This model proved to be highly effective with the greatest accuracy (90.6%) and AUC (0.946) as compared to several other models. 10
Survival Prediction in metastatic cancer
Certain studies have also used the XGBoost model to predict survival in patients with bone and brain metastases. Comparisons were made between patients who underwent neoadjuvant chemotherapy and surgery versus who underwent chemotherapy alone. Since, prognosis is the primary outcome driving management decisions in patients with metastases, these models are paramount in BC research, XGBoost was considered highly effective due to its ability to minimize loss function and had the highest AUC in both studies when compared to other prediction models with a 3-year survival AUC of 0.798 for bone metastases patient and 0.803 for brain metastases. Significantly higher survival rates were also observed in neoadjuvant plus surgical patients in case of bone metastases and surgical patients if they had brain metastases.6,13
All these tools bring a variety of avenues that can shift the paradigm of survival prognosis; however, more comprehensive research is warranted that can allow for applicability of these tools on larger and much more diverse datasets that would incorporate patients from an array of settings including different ethnicities and cultures. This would require these tools to first be validated on multiple larger datasets and possibly compared to each other. 19 Models that have been developed using retrospective data may need prospective testing and validation before they can be generally applied. These innovations, however, can serve as a cornerstone for revolutionizing BC care and survival prediction with the potential for highly personalized and tailored management approaches.
Survival predictions with visual language models and Deep Learning
More recently, Visual Language Models (VLMs) using multimodal data from sources like Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs) are emerging as powerful tools. These models demonstrate superior performance in BC survival risk assessment, with the CBIS-DDSM dataset showing an increase in the AUC from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. The EMBED dataset saw AUC improvements from 0.780 to 0.805 during validation, and for BI-RADS 3 cases, AUC improved from 0.91 to 0.96 on the CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. 20 Additionally, deep learning algorithms are enhancing BC tumor grading, crucial for predicting patient survival. Traditional grading by pathologists shows significant inter-observer variation, prompting computer-based methods. A study on 706 young invasive BC patients showed a deep learning-based VLM grading model achieved a Cohen’s Kappa of 0.59 (80% accuracy) in distinguishing tumor grades. Survival analysis indicated significant differences in overall survival (OS) and disease/recurrence-free survival (DRFS/RFS) (P < .05) between the predicted grade groups. 21 Another VLM approach approximated BC intrinsic molecular subtypes (IMS) using whole-slide images of H&E-stained biopsy sections, bypassing the need for molecular testing like PAM50. An algorithm trained on 443 tumors classified patches into 4 major molecular subtypes and was validated on 222 tumors, correctly subtyping most samples and highlighting intratumoral heterogeneity. Patients with heterogeneous tumors had intermediate survival outcomes and varied hormone receptor expression, suggesting these methods can enhance detection, analysis of tumor heterogeneity and survival prediction. 22
Despite the promising advancements in AI-driven BC survival prediction models, there are several limitations that need to be acknowledged. Firstly, the generalizability of these models may be constrained by the diversity of patient populations and datasets used in the studies. Many of the existing models have been developed and validated on specific cohorts, often from a single geographic region, limiting their applicability to more diverse populations. Additionally, the retrospective nature of some studies introduces the potential for selection bias, as the models are trained on historical data that may not fully represent the current patient demographics and treatment landscape. The reliance on retrospective data also raises concerns about the temporal relevance of these models, especially considering the evolving nature of breast cancer diagnostics and therapeutics. Ethical considerations regarding patient privacy and data security remain crucial, and the responsible use of AI algorithms in healthcare settings necessitates continuous attention. Furthermore, the limited experience and concerns about the reliability of AI technology among medical professionals, as highlighted in a survey study, underscore the need for comprehensive training and education to ensure successful integration of AI-based predictive tools into clinical practice. Moving forward, addressing these limitations will be essential to enhance the robustness, applicability, and ethical considerations of AI-driven breast cancer survival prediction models.
AI illuminates pathways to precision breast cancer care: a beacon for LMICs
As we stand at the intersection of technological innovation and healthcare, the AI-driven frameworks in BC survival prediction emerge as beacons guiding the way toward a future of tailored and efficient care. These models, with their roots firmly grounded in machine learning and deep learning processes, hold transformative potential for the global fight against BC. While the literature reviewed highlights their significant impact on prognosis prediction, it also underscores the imperative of overcoming certain limitations for a more inclusive future, particularly in LMICs. The AI revolution presents an opportunity to bridge healthcare disparities by offering cost-effective, data-driven solutions that can be adapted to diverse patient populations. However, to fully unleash this potential, there is a pressing need for collaborative efforts in research, technology transfer, and education to empower healthcare professionals in LMICs with the tools and knowledge required to integrate AI into their clinical settings. As we navigate this frontier, the promise of AI in BC survival prediction becomes not just a technological advancement but a guiding light toward personalized and accessible healthcare, ensuring that no one is left in the shadows of the fight against this formidable disease.
Conclusion
In conclusion, the literature on AI-driven BC survival prediction reflects a promising trajectory in revolutionizing patient care. These innovative models, ranging from machine learning to DNNs, showcase their potential to enhance accuracy in prognosis, guide personalized treatment strategies, and shape the future landscape of BC care. While remarkable strides have been made, challenges such as model generalizability, ethical considerations, and the need for widespread clinician adoption must be addressed. Moreover, the potential impact of these advancements is substantial, especially in LMICs, where the integration of AI could offer cost-effective, data-driven solutions to improve BC outcomes. The collective efforts to overcome limitations and foster global collaboration will be pivotal in harnessing the full transformative potential of AI in BC survival prediction, ultimately paving the way for a more precise, accessible, and patient-centric approach to managing this complex disease.
Footnotes
Acknowledgements
Not applicable.
Authors’ Contributions
Saad Nasir: Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Writing—original draft, Writing—review & editing.
Aiman Arif: Data curation, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review & editing.
Wajiha Khan: Data curation, Methodology, Writing—original draft.
Yasmin Abdul Rashid: Data curation, Formal analysis, Project administration, Supervision, Writing—review & editing.
Lubna M. Vohra: Conceptualization, Formal analysis, Project administration, Resources, Supervision, Validation, Visualization, Writing—review & editing.
Availability of Data and Materials
The datasets analyzed in the current review are present in tabulated form in the main article.
Ethics Approval
This is a review article and therefore no ethical approval was required as per institutional guidelines since there was no patient involvement.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Consent for Publication
Not applicable.
Use of AI Software
Not applicable.
