Abstract
Breast cancer screening programs are still object of harsh debate. In 2012, the Independent UK Panel reviewed the benefits and harms of mammography screening based on randomized trials and the EUROSCREEN Working Group reviewed European observational outcome studies. The conclusion was that screening programs should continue, while acknowledging that harms, such as the occurrence of false-positive results and overdiagnosis, can have a negative impact on a woman's life. Information on the balance sheet of the benefits and harms of breast cancer screening should help women and their physicians to make an informed choice. The future challenge for breast screening programs is to assess the feasibility, acceptability, effectiveness and impact of risk-based screening in order to maximize benefit-to-harm ratios.
Keywords
Background
The benefits and harms of screening for breast cancer have been and still are an object of debate in the scientific community and lay media. The controversy surrounding mammographic screening, which was markedly fierce during the 90s on the question of screening for premenopausal women (40–49 years of age), was reignited by the publication of a Cochrane review in 2001 [1], which denied the efficacy of mammographic screening and, instead, focused on its harms, especially overdiagnosis of breast cancer and false-positive results. Overdiagnosis is defined as the diagnosis of breast cancer, which would have not been clinically diagnosed in the absence of screening during the expected lifetime of the woman. It is considered the most important harm caused by screening because the women are diagnosed and overtreated for the disease. However, the issue of overtreatment is not limited to the overdiagnosed breast cancer cases, but a more general problem in breast cancer care. False-positive results are the most common harm and the consequence of recall after a mammogram for further assessment (with or without an invasive procedure, as a biopsy) with a final result of absence of breast cancer (negative result).
In this paper, we discuss the reasons for and the status of this controversy and we address the challenges to be faced in Europe. Population-based screening programs, launched in Europe at the beginning of the 90s as public health initiatives, are today widespread in Europe but remain the object of persistent criticism, with some authors asking to end organized screening programs. In response, the ‘Independent UK Panel on Breast Cancer Screening’ review was commissioned in 2010 by the UK government to provide an up-to-date assessment of both the benefits and harms associated with population-based, breast screening programs [2]. The review was based on the randomized controlled trials (RCTs) and provided an estimate of the balance of benefit, defined as breast cancer mortality reduction, and harms. The UK Panel conclusions supported the continuation of the national screening program in the UK. European countries are also continuing population-based, organized mammography screening programs, with the possible exception of Switzerland where the Swiss Medical Board recently advised against screening program implementation [3].
Screening programs in Europe
The breast cancer mortality reduction related to screening was estimated at 20% based on the RCTs at the end of the 80s [4]. Based on these results, the British breast screening program was established in 1987 following the Forrest Report [5]. In the context of the Europe against Cancer program, pilot projects [6] and then regional and national programs were implemented in several European countries. The regularly updated European guidelines for quality assurance in breast cancer screening and diagnosis were a major product of this effort [7]. In 2003, the European Council published a political directive for cancer screening, including breast cancer screening, and promoted communication and informed decision-making based on knowledge of the benefits and harms of screening. In 2008, the first European report [8] documented the diffusion of population-based, breast cancer screening in Europe. In a recent paper [9], the number of women invited to attend the European screening programs was estimated at 26 million in the age range 45–74 years (mostly 50–69 years).
In the majority of European countries, screening has been set up as population-based, public health programs that in most cases invite women in the 50–69-year-old age group (initially up to 64 years, in the UK). A target population receives an invitation and appointment, has high quality mammography and is reinvited after a fixed time (the interscreening interval). Mammography using two views per breast was the test, suggested as the only primary screening test in the European guidelines, with double reading by dedicated radiologists. The guidelines suggested a minimum of 5000 mammograms per year per radiologist but usually more are performed by dedicated radiologists in screening centers. However, there are differences in the performance of service screening programs in Europe, such as in recall and false-positive rates [10]. The observed difference in rates was attributable to different diagnostic attitudes across countries. Differences in screening mammography regimens and performances should therefore be considered in the future when each country will be able to evaluate benefits and harms of their screening program.
The evaluation of population-based screening impact
In 2012, the EUROSCREEN Working Group reviewed the studies published in European countries where screening research has been performed to assess the effectiveness and adverse effects of organized screening programs. Estimates of breast cancer mortality reduction, overdiagnosis and false-positive results allowed for the calculation of the first European balance sheet of benefits and harms in Europe [11].
Breast cancer mortality reduction
In the EUROSCREEN review, three kinds of observational study designs were considered: trend studies, incidence-based mortality and case–control (CC) studies. Trend studies, as tools for the evaluation of organized screening, have been criticized [12]. The long and gradual implementation of screening programs and the problem of including a large proportion of breast cancer cases diagnosed in women who did not have the opportunity to participate in screening are the main reasons for bias. Typically, trend studies are based on demographic data and do not take into consideration the modality of diagnosis in relation to screening, in other words, whether or not the cancer was diagnosed at screening, at the first or a repeat examination, in the interval between screens or clinically, outside of screening. Furthermore, the estimate of the background incidence rate (in the absence of screening) is controversial and uncertain. A few studies are based on individual records, but the majority of studies use aggregated data as denominators and, sometimes, also for breast cancer cases. In conclusion, trends of mortality and incidence have problems, especially when based on short temporal series. Nonetheless, some trend studies have received massive attention, since their authors claimed an absence of impact of mammographic screening [13].
The most important observational study designs are incidence-based mortality (IBM) and CC studies. IBM studies [14] include only breast cancer diagnoses and deaths within cohorts, or demographic populations, of those women who had the opportunity to be invited and/or screened within the screening program. Women might be diagnosed at screening, so called screen-detected, or clinically, some of them in the interscreening interval. The follow-up time of the comparison group is a critical issue in IBM studies, for if it is too short or not equal to the follow-up time in the study group the possibility of observing a change in breast cancer mortality is reduced.
The other crucial issue is defining the comparison group, in other words, a nonrandomized, comparable population of women, not invited and/or not screened. Two kinds of comparison groups exist – geographical and historical. A geographical comparison group is a population, similar to the study group, in a nearby city or area without screening in the same period. There is a possibility of noncomparability bias with geographical comparison because different levels of incidence and mortality may exist in the two cities or areas compared. A historical comparison group monitors a group before the start of the population-based, organized screening program for incidence and mortality (also defined as not-yet-invited women). Bias is also possible with a historical comparison because of assumptions needed to predict trends in underlying breast cancer incidence or mortality without screening. Self-selection bias is avoided in IBM studies when invited women are compared with uninvited, similar to the comparison in RCTs [15].
CC studies compare the individual screening histories of women who have died of breast cancer (cases) with a sample of women from the same population who were alive at the time of death of the cases (controls). Both cases and controls should have had equal opportunity to participate in screening. The design of a CC study is complex, but is well-understood methodologically [16]. CC studies, conducted in parallel with the RCTs, supported the RCT evidence of mortality reduction whereas their estimates showed higher reduction of mortality. This is largely explained by the dilution of the effect in randomized trials (intention to treat analysis) and self-selection bias (which can be in both directions, pro or against the effect of screening). The CC methodology is considered a powerful tool for monitoring the impact of service screening, albeit with well-known challenges such as self-selection bias [17]. Regarding the latter, it should be noted that participation of a woman in organized screening is a personal decision, which is influenced by personal characteristics and, therefore, the participation could be related to a higher (or lower) risk of breast cancer death. The risk of self-selection bias is therefore always present in the comparison of participating with nonparticipating women. It is due to this problem, that most high-quality observational studies estimate the mortality reduction among invited women. If one thereafter wants to estimate the mortality reduction among participating women, potential self-selection bias has to be controlled for [18].
Table 1 shows the best estimate of breast cancer mortality reduction outcomes in the RCTs and observational studies, according to the published reports of the Independent UK Panel and the EUROSCREEN Working Group. In the interpretation of the results, it is important to recall that the RCTs started in the 70s and 80s, whereas the observational studies considered in the review were conducted from the late 90s to the early 2000s. Furthermore, the RCT outcomes were reported for invited women, whereas in the observational studies the outcomes are usually estimated separately for invited and screened women.
Estimates of breast cancer screening outcomes from the EUROSCREEN Working Group and the UK Independent Panel.
Overdiagnosis (UK Panel): Measure A.
Overdiagnosis (UK Panel): Measure C.
Overdiagnosis
Mammographic screening is effective if the diagnostic anticipation of breast cancer changes the natural history of the disease, by means of effective treatment. Incidence excess, measured with respect to the level of the disease rate in the absence of screening, is an expected consequence of the initiation of organized, population-based screening. At the prevalence screening round, slowly growing breast cancers have a higher probability of being detected, and a large proportion of them are at an early stage. Breast cancer has a long duration in its preclinical phase and, because this ‘sojourn’ is long, the follow-up duration needed to measure the impact of screening is long, too. The diagnostic anticipation before clinical symptoms, the lead time, is reflected in some biological characteristics of the tumor: the diameter of the lesion (usually approximated by the pathological T size), nodal invasion (approximated with the pathological nodal status) and nuclear grade. Biomolecular and genetic markers have changed the traditional classification for breast cancers and the future use at population level of the new classifications will improve the research and outcome evaluation of breast cancer screening.
Overdiagnosis is defined as the diagnosis of breast cancer in screen-detected cases, which would never have surfaced as clinically diagnosed cancers during the women's lives without the screening intervention [19]. It is the most important harm caused by screening because the women are treated for the disease. Overdiagnosed cases cannot be identified individually, but they are expected to be the slow growing, less aggressive cancers, both in situ and invasive.
The follow-up duration of each individual after the screening period is an essential dimension of the overdiagnosis measure if the incidence excess inherent to breast cancer screening is to be disentangled from overdiagnosis, the excess of breast cancer diagnosis attributable to the screening detection. Today, knowledge of tumor characteristics (pathological or radiological) or other markers is not sufficient to predict individual overdiagnosis or justify decreasing the application of therapeutic protocols, in the absence of randomized trials of equivalent efficacy [20].
Quantification of overdiagnosis
There is a consensus that the best estimate of overdiagnosis is the cumulative difference of the breast cancer cases, including in situ and invasive, of the screened and unscreened cohorts calculated after not only the end of the screening period but also the compensatory drop [21].
However, the denominator for the quantification of overdiagnosis, and the follow-up duration are still matters of controversy. The estimates of overdiagnosis largely vary as a consequence of methodological choices. Measures, using different components, study design and follow-up duration vary from 0 to 54% of the expected number of cases in the absence of screening [22]. The Independent UK Panel [1], following previous work by De Gelder et al. [23], quantified overdiagnosis using four different measures. In each of them, the numerator is the absolute incidence excess of breast cancer cases. Furthermore, the outcome measure of the Independent UK Panel is applied to invited women, as estimated in the RCTs, whereas measures estimated in the European observational studies are applied to both invited and screened women. Each measure answers a different question regarding the impact of this important harm of mammography screening. For example, from a population perspective – in line with the UK Independent Panel Measure ‘A’ and ‘B’, which are similar– the question is: What is the proportion of overdiagnosed breast cancer taken as a proportion of all cancer diagnosed over 25 years of life, given that I am accepting the invitation to screening?
Alternatively, from the perspective of an individual woman, the measure defined as ‘C’ from the UK Panel answers the question: What is the proportion of all breast cancer cases diagnosed in 25 years since the first invitation (a proxy of a lifetime observation) which are overdiagnosed cases, given that I am accepting the invitation to screening and I will have a cancer diagnosed at screening or in the interval?
Randomized screening trials were reviewed by Moss in 2005 [24] and categorized according to whether or not the women of their control groups had been offered screening at the end of the study period. The UK Panel [2] followed this categorization and estimated the excess of incidence at end of the follow-up from the Malmo [25] and Canadian trials, which did not offer screening at the end, as 11% (Measure ‘A’) and 19% (Measure ‘C’). The difference in denominators explained the change of the estimate from 11 to 19% of overdiagnosis. In the update of the Canadian Trial, the authors used the number of screen-detected breast cancers (106/484) as denominator (Measure ‘D’), giving an estimate of 22% overdiagnosis [26]. There are several limitations in the estimates of both the Canadian and Malmo trials. The first is the duration of follow-up even if there were 25 years of follow-up. Although the Canadian Trial had a long follow-up duration, some of the women enrolled were still likely to have been undergoing screening when follow-up was stopped (after 25 years, for example, a 40-year-old woman at entry is 65 years old). The second is that the Canadian Trial did not include in situ breast cancers in the analysis. Third, spontaneous screening has been reported in Malmo contaminating the control group. Finally, both in Sweden and in Canada, population service screening started after the end of the trials and both groups were invited to participate.
Problems with observational studies on overdiagnosis
Observational studies have contributed to the evaluation of overdiagnosis in breast cancer screening and more will be expected to contribute in the future. Unfortunately, epidemiologists differ in their analytical methods and the methods they applied to estimate overdiagnosis vary between studies. For this reason, the Independent UK Panel decided to rely only on the RCTs, although they did review the evidence from observational studies. The best methods to estimate overdiagnosis in observational studies require the follow-up of pairs of birth cohorts or enrollment age cohorts, in other words, an intention-to-treat evaluation, without randomization. One cohort is invited to screening and the other not, with monitoring of the whole screening period and the compensatory drop phase after the end of screening (usually at 70 years of age in European organized programs) [21]. Analysis of the incidence trend in national or regional populations, based on cancer registry data in the absence of individual data classifying the individuals by their screening history, is affected by many problems. The intervention group should have had the opportunity to be screened and show a high attendance rate. The reference population should be comparable to the invited population so far as is possible in terms of the background incidence rate, breast cancer risk factors, socioeconomic status and use of health services other than for mammography. In overdiagnosis studies, there are two major sources of bias. The first is the issue of comparability of the two groups – one group coming from a screened population and the other from a population without screening. The second problem is the so-called adjustment for lead time, in other words, the need for an adequately long duration of the follow-up time after the end of personal invitation for screening in order to observe the compensatory drop phase [27]. The drop is the reduction of incidence after the diagnostic anticipation essential to screening. Lead time is intrinsic to mammographic screening; without it effective treatment and, therefore, mortality reduction would not be possible.
The balance sheet of benefits & harms
The benefits and harms of screening estimated in the RCTs and observational studies are presented usually as relative measures of the effect. They indicate how much an intervention decreases or increases the risk of breast cancer occurrence or breast cancer death and are expressed as a relative risk or similar measure. The real impact of the intervention, however, needs to be estimated as an absolute risk (i.e., absolute benefits and absolute harms) in order to be meaningful.
It is important for communication and decision-making to offer to stakeholders not only the relative risk estimate, but also an estimate, albeit imperfect, of the impact for an individual or group. In order to offer a decision-making scenario for breast cancer screening outcomes, a long period of screening is essential for each woman screened. In fact, service screening in Europe usually starts at 50 and continues until 69 years, with a screening interval of 2 years. After 70 years of age, the woman is no longer invited and we observe a compensatory reduction of the number of breast cancer cases diagnosed. Recently, in some countries, the upper limit has been raised to 74 years, and the lower limit in some areas lowered to between 40 and 47 years. Because mammographic screening is a continuing project, almost lifelong, and the life expectancy of European women today is long and increasing, the decision to stop invitations at a later age might change the overdiagnosis risk. In fact, estimates of the preclinical, asymptomatic phase of breast cancer are long in older ages, and sometimes can be longer than the life expectancy.
The components of the proposed decision-making scenario are presented in legend 2 (Table 2). Calculated for 1000 women at the start of service screening, these estimates are applicable to a medium size city in Europe where about 1000 women will be invited for the first time at 50 years of age. Of course, organized screening in the area invites many cohorts each year and the impact at community level will be determined by the coverage of the target population [12].
Balance sheet of benefits and harms. UK Independent Panel and EUROSCREEN Working Group estimates.
Table 2 shows a comparison of the balance sheets of the UK Independent Review [2] and the EUROSCREEN Working group [12]. In order to present the outcomes of screening in absolute numbers, we applied the outcomes in Table 1 to the same scenario:
1. One thousand 50-year-old women followed for 30 years.
2. The breast cancer incidence cumulative number in the absence of screening was estimated as 67 breast cancer cases diagnosed from 50 to 79 years.
3. The breast cancer mortality cumulative number in the absence of screening was estimated as 30 breast cancer deaths expected from 50 to 79 years (in practice 19 out of the 30 expected deaths were diagnosed in the ages 50–69 years).
The UK review overdiagnosis estimate (Measure C: 19%) was originally estimated using denominator breast cancer cases in screening ages (50–69 years) and then applied to the invited. Euroscreen estimates are Measure A.
Table 2 (with legend) shows the benefit (mortality reduction) and harm (only overdiagnosis is considered but there are other harms) outcomes presented as absolute numbers of lives saved from breast cancer and breast cancer cases overdiagnosed. In the balance sheet based on the EUROSCREEN overview, the chance of saving a woman's life by population-based mammography screening is greater than that of overdiagnosis. The UK Independent Panel estimated on the basis of their review that for each breast cancer death prevented, about three overdiagnosed cases will be identified and treated. Both reviews concluded that the benefits of mammography screening outweigh the harms, but each individual woman will need to make her own decision based on the estimate provided.
Risk-based screening: a challenge for service screening in Europe
The successful experience of the one-size-fits-all screening programs went in parallel with increasing knowledge about risk factors for breast cancer. In particular, there have been many studies on the relationship of elevated risk and breast density [28,29]. Mammographic breast density, which reflects the amount of radiographically dense tissue on a mammogram, is one of the strongest risk factors for breast cancer, but also hinders the detection of cancer by mammography. Since density is generally higher in younger women, the performance of mammographic screening is lower in pre- and perimenopausal women, a phenomenon already observed in the first breast cancer screening trials. Recent research further shows that evaluation of breast density should be complemented by information on genomics (such as single-nucleotide polymorphisms) and proteomics (hormones and blood markers), as well as other environmental and lifestyle risk factors. Improving the prediction of individual risk is the aim of research, which is ongoing or planned in prospective study cohorts and observational studies [30].
Integration of risk factor information in comprehensive risk prediction models will lead to proposals for new screening strategies, tailored to a woman's individual risk of breast cancer. Tailored screening regimens may be addressed in several ways: changing age limits for starting and stopping screening, changing screening intervals, as well as adding new imaging techniques to mammography, such as tomosynthesis and 3D ultrasound. Expectations are that tailored risk-based approaches for screening regimens will be able to improve benefit-to-harm ratios, by intensifying screening in women found to be at high risk, while low-risk women will possibly receive less intense regimens [31]. At the same time, explaining the risk-based concept to women invited for screening will provide a challenge [32]. If risk-based screening is going to be the screening paradigm, novel opportunities also arise for primary prevention. Once women at high risk (as determined, e.g., by family history, genetics or BMI) can be identified in the invited and screened population, preventive measures such as risk-reducing medication and lifestyle changes might be offered to modify their risk for breast cancer [33]. Application of an individual woman's risk is thus a possible key tool to reducing breast cancer mortality, with improved benefit-harm ratios, and reversing the increase in new breast cancer diagnoses through primary prevention. This new approach should be tested in studies evaluating comparative effectiveness.
Tailored breast cancer screening, which in Europe will be based on the organized screening infrastructure and methodology, will have an impact on the healthcare system, and it will influence women's lives. To be able to test your risk, alter your lifestyle, take prophylactic measures or be offered more or less intensive screening regimens which will impose a significant additional responsibility on women and professionals [34,35]. What screening and preventive strategies to use, as well as how much information a woman wishes to receive, is an individual's decision but society must support the decision-making process [36]. It is therefore essential that behavioral, ethical, legal, regulatory and social implications of risk-based screening be investigated, in parallel to improved risk prediction and comparative effectiveness, while taking into account that different healthcare systems in Europe may require different implementation strategies.
Conclusion & future perspective
Population-based screening for breast cancer is widespread in Europe. Recent reviews have published balance sheets of benefits and harms and concluded that organized screening programs should continue. The controversy around screening mammography is nonetheless continuing. The next challenge for population-based service screening programs is the evaluation of the feasibility and outcomes of tailored risk-based screening considering the individual risk profile of the woman.
Executive summary
Screening mammography is an object of controversy in scientific and lay media.
Recent reviews of the evidence of randomized trials and observational studies in Europe concluded that population-based service screening for breast cancer should continue, given the estimated balance sheet of benefits and harms.
Communication of screening outcomes and the benefits and harms must improve and involve women in informed decision-making.
Population-based screening programs have been successful in reducing breast cancer mortality and contributed to the improvement of breast cancer care.
A future challenge is to assess the feasibility and impact of risk-based breast cancer screening, tailored to the individual woman's characteristics.
In Europe, this development should be based on the population-based organized screening experience and practice.
Footnotes
The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.
