Abstract
Background:
Breast cancer (BC) screening with mammography reduces mortality but considers currently only age as a risk factor. Personalized risk-based screening has been proposed as a more efficient alternative. For that, risk prediction tools are necessary. Genome-wide association studies have identified numerous genetic variants (single-nucleotide polymorphisms [SNPs]) associated with BC. The effects of SNPs are combined into a polygenic risk score (PRS) as a risk prediction tool.
Objectives:
We aimed to develop a clinical-grade PRS test suitable for BC risk-stratified screening with clinical recommendations and implementation in clinical practice.
Design and methods:
In the first phase of our study, we gathered previously published PRS models for predicting BC risk from the literature and validated them using the Estonian Biobank and UK Biobank data sets. We selected the best performing model based on prevalent data and independently validated it in both incident data sets. We then conducted absolute risk simulations, developed risk-based recommendations, and implemented the PRS test in clinical practice. In the second phase, we carried out a retrospective analysis of the PRS test’s performance results in clinical practice.
Results:
The best performing PRS included 2803 SNPs. The C-index of the Cox regression model associating BC status with PRS was 0.656 (SE = 0.05) with a hazard ratio of 1.66. The PRS can stratify individuals with more than a 3-fold risk increase. A total of 2637 BC PRS tests have been performed for women between the ages 30 and 83. Results in clinical use overlap well with expected PRS performance with 5.7% of women with more than 2-fold and 1.4% with more than 3-fold higher risk than the population average.
Conclusion:
The PRS test separates different BC risk levels and is feasible to implement in clinical practice.
Introduction
Breast cancer (BC) is the leading cause of cancer deaths in women. Every year adds 2 million new diagnoses and more than 600 000 deaths. 1 Breast cancer screening with mammography reduces BC mortality risk to 20% to 40%.2-4 Current BC screening guidelines are mostly based on age only and do not support regular screening of women below the age of 50. In most European countries, women aged 50 to 69 years are invited to BC screening at 2-year intervals.5,6 Such an approach does not account for the wide variation in individual women’s BC risks and disregards younger women with a higher risk, but also women over age 50 with higher risk levels who could benefit from personalized screening. Risk-based screening, in which individualized risk assessment is used to inform screening practices, has been proposed as an alternative to age-based screening.7,8
Around 30% of the total BC risk has been shown as hereditary. 9 Genetic factors include rare pathogenic variants (PV) in high- and moderate-risk cancer predisposition genes (BRCA1, BRCA2, etc), having effects large enough to warrant monogenic testing.10-12 However, only a fraction (5%-10%) of BC cases are caused by these rare PVs. 13 A considerable part of BC risk variation is explained by variants outside these high-risk genes in the form of BC-associated common single-nucleotide polymorphisms (SNPs), identified by genome-wide association studies (GWAS).14,15 A polygenic risk score (PRS) is the combined effect of individual BC susceptibility SNPs. Although individual associated SNPs may confer only modest disease risk, the combined effect of all known associated SNPs on risk can be substantial. Breast cancer PRSs identify differences in genetic risks and provide a straightforward basis for designing personalized screening programs by accounting for individual genetic susceptibility. 16 Currently, PRSs have not yet been implemented in routine BC screening, but simulations have suggested that risk profile informed preventive activities could provide cost savings and health benefits.17,18 High-risk estimation could be also the indication for the use of hormonal chemoprevention. 19
There is so far no consensus clinical model for the systematic implementation of PRS in BC personalized screening. 20 The current report describes the development of a PRS test as a clinical tool for BC risk-stratified screening, a model set of recommendations for the clinical implementation and the first results of application in real-life clinical practice.
Methods
We conducted a 2-phase investigation. In the first phase, we retrospectively validated the performance of PRSs using both prevalent and incident data sets from 2 genetic biobanks, with the aim of finding the best performing model for a PRS test. In the second phase, we performed a retrospective analysis of the results obtained during the routine clinical implementation of the test. The development of the PRS test as a clinical tool for the characterization of individual polygenic BC risk is described in detail in Appendix 1.
Study cohorts for the PRS development
Breast cancer data sets with genotyped data were acquired from 2 population biobanks: the Estonian Biobank of the Estonian Genome Center at the University of Tartu (EstBB) and the UK Biobank (UKBB). Quality-controlled samples were divided into prevalent and incident data sets. In the EstBB cohort, we retained a total of 32 548 quality-controlled female samples. The prevalent data set contained 315 cases of BC that were diagnosed before biobank recruitment and 1602 controls. The incident data set contained 365 cases of BC that were diagnosed after biobank recruitment and 30 266 controls. The UKBB data set contained 249 062 samples that passed the quality controls. In the UKBB, we identified 8637 prevalent cases and 6825 incident cases that were complemented with 44 952 controls and 188 648 controls, respectively. Prevalent data sets were used for identifying the best candidate model and the incident data sets were used to obtain an independent PRS effect estimate on BC status.
Selection and analysis of candidate PRS models
The literature search for PRS models in the public domains was performed with Google Scholar and PubMed web search engines. A list of articles using the search [“Polygenic risk score” or “genetic risk score” and “breast cancer”] were manually checked for the inclusion criteria.
We evaluated the relationship between BC status and standardized PRS in the 2 prevalent data sets with a logistic regression model to estimate the logistic regression–based odds ratio per 1 standard deviation of PRS (ORsd), its p value, model Akaike information criterion (AIC), and area under the receiver operating characteristic (ROC) curve (AUC). We also pruned the PRS from multi-allelic, non-autosomal, non-retrievable variants based on bioinformatics re-analysis with Illumina GSA-24v1 genotypes and non-overlapping variants between EstBB and UKBB data.
We selected the candidate model with the highest AUC to independently assess risk stratification in the incident data sets.
The main aim of the analyses in the incident data sets was to derive a primary risk stratification estimate, hazard ratio per 1 unit of standardized PRS (HRsd), using a right-censored and left-truncated Cox regression survival model. We also assessed the goodness-of-fit of the survival model using Harrell’s C-index and the likelihood ratio test.
Furthermore, we evaluated the concordance between theoretical hazard ratio estimates derived with the continuous per unit PRS (HRsd) estimate and the hazard ratio estimates inferred empirically from data.
For individual BC absolute risk 10-year calculations, we used the risk model developed by Pal Choudhury et al. 21 This absolute risk model allows disease background data from any country. In the current analysis, we used Estonian background information. Theoretical proportions of individuals belonging to risk groups were derived by extracting relative risk estimates for PRS percentiles 1 to 100 from the Choudhury et al model. Conformance of counts of individuals belonging to risk groups to theoretical values was performed using 2-tailed exact binomial tests.
Detailed methods and data sources for that are characterized in Appendix 1.
Development of the clinical implementation model and clinical recommendations
We evaluated PRS risk stratification in the Estonian BC screening context and simulated the extent of risk separation in the Estonian population. Women in Estonia currently start BC screening at age 50. Our analysis first established the 10-year risk of a 50-year-old woman with a population average of PRS (“average female”) using the model by Pal Choudhury et al 21 : the reference for the level of risk that initiates population-level screening. Here, we assessed the differences in ages where individuals in various PRS risk percentiles attain 1-fold to 3-fold risk increases compared with the 10-year risk of an average woman.
Based on these analyses, we developed recommendations for a BC screening attendance program based on prescreening PRS testing. This approach uses both relative risks, a fold difference of 10-year risks compared with a genetically average woman of the same age, and her absolute 10-year risk.
The technical documentation and the whole testing pipeline were created, and the BC PRS test was registered as a medical device (IVD) in Estonian Medical Devices Database (EMDDB code: 14726). For clinical use, a laboratory test report (sample in Appendix 2) and a more detailed report for written post-test counseling were created. The test was implemented on the clinical-grade level as a medical device–based healthcare service by OÜ Antegenes or by partner health care institutions in Estonia. The test is used as home-based testing or at a clinical site taken test with a buccal swab for DNA extraction or using existing genotyped data in the Estonian Biobank. The standard procedure includes the use of Illumina Global Screening Array-24 (GSA) v3.0 chip and Illumina iSCAN sequencer for genotyping. This workflow genotypes ~762 000 markers on the GSA chip by Illumina’s Infinium HTS (high-throughput screening) protocol (Illumina Inc, http://www.illumina.com; Document 15045738 v04). The test can also use information from other microarrays and sequencing approaches that output DNA data broadly covering the human genome. The PRS test performs the risk assessment based on imputed genotype data. Quality-controlled markers resulting from genotyping are imputed using a 1000G panel with reference to the human genome GRCh37. The HRC and TOPMed panels were not adopted due to the requirement of using external imputation servers.
Each test was preceded by informed consent. In addition to written information, pretest and posttest oral counseling were offered if needed. All women received PRS test reports directly via the web portal, but the results are also transmitted to the national health information system, where they are also available to other health care providers. According to the test results and recommendations, women are referred to mammography screening institutions or breast clinics for further personalized screening. For additional PV testing recommendations, we use a questionnaire about family cancer history with recommended indications for PV testing. 22 Current official BC screening in Estonia is for the age group 50 to 69 years; for the analysis, we divide women into 2 groups: age 30 to 49 (prescreening age) and 50 to 83 (screening age and older) years.
Results
Selection of the PRS model for the clinical PRS test
Altogether, we chose 4 models from 3 different articles to be evaluated: PRS313, 23 PRS77, 15 PRS78, 24 and PRS3820 23 (original numbers of variants), Table 1. The best performing model was selected based on AUC, ORsd, AIC, and pseudo-R2 metrics in both EstBB and UKBB data. The PRS 2803 model (that was based on Mavaddat et al PRS3820 model) 23 performed the best (Table 1).
Comparison metrics of breast cancer PRS models based on the prevalent Estonian Genome Center and UK Biobank data sets.
Abbreviations: PRS, polygenic risk score; AUC, area under the receiver operating characteristic curve; OR, odds ratio; CI, confidence interval; AIC, Akaike information criterion.
Development of a set of clinical recommendations
To apply a PRS in clinical practice, testing must be complemented by clinical recommendations for preventive activities at different levels of risk. Based on developed PRS induced risk differences, we developed personalized recommendations that are based on relative risks compared with an individual of the same nationality, age, and sex, and the estimated absolute risks. Recommendations presented in Table 2 are based on the age when an individual attains the risk level of genetically average 50-year-old women, as the average risk level at the age of 50 is a generally accepted recommendation to start BC screening, and also using the analogy with clinical recommendations for moderate-risk PV carriers. 22
Recommendations for personalized screening based on different PRS risk levels.
Abbreviations: PRS, polygenic risk score; BC, breast cancer.
If the recommended age is below the individual’s current age, then we recommend the current age.
Results from the clinical implementation
Between September 1, 2020, and February 28, 2022, 2637 BC PRS tests with clinical recommendations have been performed for women between the ages of 30 and 83 in the Estonian health care setting. Patients’ distribution into risk groups and the comparison with theoretical PRS risk distributions are characterized in Table 3. As the PRS distributes individuals on the normal distribution curve of the risk, it is possible to calculate the theoretical numbers of individuals at different risk levels.
Women’s distribution according to breast cancer PRS risk levels in clinical practice in Estonia and comparison with theoretically calculated risk distribution percentages (CI using 2-sided exact binomial test).
Abbreviations: PRS, polygenic risk score; CI, confidence interval.
Two-sided exact binomial test p value < 0.05
In the age group of 30 to 49 years, BC PRS test has been applied to 1881 women. Current testing has detected 318 (16.9%) women, whose risk level is already as high or higher than average at age 50. 141 (7.5%) of women had risk levels 2 times higher than average and who could discuss more frequent mammographies and hormonal chemoprevention. Thirty-one (1.6%) tested women had risk levels 3 times higher than average and who are candidates for screening magnetic resonance imaging (MRI).
In the age group 50 to 83 years, BC PRS test has been applied to 756 women. We detected 77 women (6.2%) who have 2 times higher PRS risk levels and are candidates for annual screening mammography and for hormonal chemoprevention. Seven women (0.9%) had risk levels 3 times higher than average; they got recommendations for additional MRI screening.
In combined analysis of all tested women, we note a small, but statistically significant (binomial test p = 0.0075) underrepresentation (n observed = 150; n expected = 186) of women belonging to risk group 3 (2-3× increase in relative risk). This difference is present, but not statistically significant in both 30 to 49 and 50 to 83 age groups and is not statistically significant after Bonferroni correction for 12 tests. No significant deviations from expectation were observed in other age and risk groups.
Discussion
Risk-based BC screening, in which individualized risk assessment is used to inform screening practices, has been proposed as an alternative to age-based screening.7,8 For that, risk prediction tools are necessary. Age only is an imperfect marker for BC risk, given that genetic susceptibility, lifestyle factors, and reproductive history can affect a woman’s chance of developing BC. As around 30% of the total BC risk has been shown as hereditary, 9 evaluation of genetic predisposition for BC can serve as a tool for risk-stratified screening.
A considerable part of BC genetic variation is explained by SNPs,14,15 effects of which are summarized into PRS. Breast cancer PRS identifies differences in genetic risks and provides a straightforward basis for designing personalized screening programs by accounting for individual genetic susceptibility. 16 Currently, PRS scores have not yet been implemented in routine BC screening. 20 As a relatively small proportion of the population are carriers of BC PVs, we have aimed to develop an additional BC risk prediction tool in the form of PRS.
In this study, we validated different publicly available PRS models to find the best performing model for predicting the risk of BC. Our best performing model, named PRS2803, was a pruned version of Mavaddat et al 3820 PRS model containing a total of 2803 SNPs out of 3820. 23 Its performance was consistent with the author’s results. Our model was used to design a novel absolute risk–based screening strategy. It is based on Estonian screening information and background data to identify the extent of more than 10-fold PRS-based risk differences between the extremes. Our analysis showed that 1% of women would need to join screening by the age of 34 and more than 30% of individuals do not ever attain the risks of a genetically average 50-year-old woman (the age when women conventionally start screening).
For the clinical implementation, we developed recommendations based mostly on 2 aspects—the average BC risk level currently accepted for routine BC public screening and the analogy with recommendations for moderate-risk PV carriers. 22 Using PRS, it is possible to divide the patient’s relative risk of developing BC into different levels compared with the average in the given age while accurately assessing the risk of a particular percentile. In Europe, mammography screening in the age group 50 to 69 years at 2-year intervals is currently a recognized standard practice, which reduces BC mortality. Consequently, the “zero point” of the risk level at the beginning of the screening is the average risk level of 50-year-old women. Detecting younger women with similar or higher risk levels already from age 30 to 35 allows implementing similar mortality reduction measures, avoiding same time screening measures for women with lower risk.
As PRS can predict similar BC risk levels to moderate-risk PVs (ATM, CHEK2, and others), we used for our recommendations analogy with these for mammography intervals and MRI use. 22 It should be noted that we do not recommend individuals to join public screening programs later than the standard starting time as the potential benefits and losses from decreased intervals have not been separately validated. As the PRS test does not analyze PVs that significantly increase the risk of BC, our application model recommends additional counseling and testing for the PVs according to the widely acknowledged criteria for PV testing. 22 We see that in the future also all women should be tested for PVs, as family history criteria may miss a proportion of actual PV carriers, but currently the main obstacle to that has been the relatively high cost of PV tests.
Individual risk-based approach for BC screening has allowed the development of more equivalence and equitability among women regarding screening. If an average risk at age 50 (or at any other age) is a commonly accepted level for mammography screening, then personalized recommendations to start screening when an individual risk level reaches average risk create equivalence. If some individuals in average risk level are currently offered interventions or screening, then it would be unfair to deny that to others with equivalent or higher risk of disease. It is important to identify individuals who are at high risk but are currently invisible to the system. Also, for all women, the PRS test can also serve as a tool for individual informed “shared decisions” for mammography screening participation. 6
Results from the clinical application of our approach show that actual results of PRS testing overlap well with expected results (Table 3). The detected underrepresentation of women belonging to risk group 3 can most likely be attributed to sampling error in PRS percentiles 93 to 99. Alternatively, the difference may arise from uncertainties in population-specific mean and standard deviation estimates used for converting the raw PRS values to z-scores. We are continuing to monitor this observation as more data become available.
Several combined risk prediction models incorporate traditional risk factors such as demographics, reproductive history, menopausal status, family history, previous biopsies, mammographic density, carrier status of PV, and PRS.25-28 The practical routine application of such compounded models in screening is complicated due to the nonavailability and quality of data. In practical settings, the data collection difficulties need to be weighted with expected gains. The feasibility, clinical utility, costs, and cost-effectiveness of risk-based programs using a comprehensive model versus a model with only PRS and additional PV testing need to be evaluated additionally. 20 Polygenic risk score alone has been shown to predict the risk of BC in European descent individuals more accurately than current clinical models. 29 Van den Broek et al. have assessed the clinical utility of a first-degree BC family history and PRS to inform screening decisions among women aged 30 to 50 years. 30 Results suggested that BC family history and PRS could guide screening decisions before age 50 years among women at increased risk of BC with the potential to prevent more BC deaths for identifiable groups of women at high risk due to their BC family history and polygenic risk. Analysis by Wolfson et al. concluded that population-wide programs for BC screening that seek to stratify women by their genetic risk should focus first on PRS, not on more highly penetrant but rarer variants, or family history. 31 The PRS was most predictive for identifying women at high risk, while family history was the weakest.
The weakness of our approach is that this is not yet based on the results of randomized trials. Randomized clinical trials of screening interventions provide the strongest evidence of efficacy, although they have certain limitations—a long time to perform and accordingly uncertainty about the relevance of the original study approach after a long study period due to additional scientific progress. Therefore, simulations and modeling studies can indicate which screening strategies are likely to be optimal in each setting. Such modeling studies are started based on the developed approach described in the current report.
In conclusion, we have used a PRS-based model to develop a novel model for BC screening and implemented that with a questionnaire for additional stronger risk MVP testing in clinical practice for personalized recommendations. Our adapted PRS model identifies individuals at more than 3-fold risk and elucidates large differences in attaining the same level of absolute risk. The genetic risk-based recommendations can be applied prospectively by individuals and by institutions aiming to make screening provisions more efficient. Our approach is easily adaptable to other nationalities by using population background information data of other genetically similar populations. For different ethnicities, additional ethnicity-based validations are necessary. We have implemented current BC PRS in Estonia for women with European ethnic background, as GWAS analyzed for current development were based on ethnicities with a European background. Similarly, the clinical screening recommendations can be adapted to locality-specific screening environments if we can infer the absolute risk of the average woman in that locality.
Conclusions
In the additional validation of BC PRSs in EstBB and UKBB, the model with 2803 SNPs demonstrated improved performance compared with models with a smaller number of SNPs:
PRS test separates different BC risk levels.
BC PRS test is feasible to implement in clinical practice for risk-stratified BC prevention.
Footnotes
Appendix 1
Appendix 2
Acknowledgements
Our appreciation goes to everybody from the EstBB actively involved in and supporting the development and implementation of BC PRS into clinical practice. Also, we would not have been able to perform any analyses without the computation resources provided by the HPC Center of the University of Tartu. We would like to extend our gratitude to Siim Sõber for his invaluable contributions to the statistical analyses and further conceptualization in this scientific manuscript.
