Abstract
Objectives
To assess the impact of population-based mammographic screening on breast cancer mortality in Europe, considering different methodologies and limitations of the data.
Methods
We conducted a systematic literature review of European trend studies (n = 17), incidence-based mortality (IBM) studies (n = 20) and case-control (CC) studies (n = 8). Estimates of the reduction in breast cancer mortality for women invited versus not invited and/or for women screened versus not screened were obtained. The results of IBM studies and CC studies were each pooled using a random effects meta-analysis.
Results
Twelve of the 17 trend studies quantified the impact of population-based screening on breast cancer mortality. The estimated breast cancer mortality reductions ranged from 1% to 9% per year in studies reporting an annual percentage change, and from 28% to 36% in those comparing post- and prescreening periods. In the IBM studies, the pooled mortality reduction was 25% (relative risk [RR] 0.75, 95% confidence interval [CI] 0.69–0.81) among invited women and 38% (RR 0.62, 95% CI 0.56–0.69) among those actually screened. The corresponding pooled estimates from the CC studies were 31% (odds ratio [OR] 0.69, 95% CI 0.57–0.83), and 48% (OR 0.52, 95% CI 0.42–0.65) adjusted for self-selection.
Conclusions
Valid observational designs are those where sufficient longitudinal individual data are available, directly linking a woman's screening history to her cause of death. From such studies, the best ‘European’ estimate of breast cancer mortality reduction is 25–31% for women invited for screening, and 38–48% for women actually screened. Much of the current controversy on breast cancer screening is due to the use of inappropriate methodological approaches that are unable to capture the true effect of mammographic screening.
Introduction
Many countries implemented population-based screening following the results of the RCTs. 13 There are several reasons why the effectiveness of population-based service screening mammography may differ from that observed in the RCTs, including the wider base of professionals who are involved in screening and the improvement of mammographic and other techniques since the trials were conducted. 14 , 15 In RCTs and in some observational designs the effect of screening is measured by comparing women invited with women not invited. This comparison is influenced by the attendance rate and therefore reflects the performance of the programme, rather than the screening test itself. The effect estimate will be larger when comparing breast cancer mortality in screened women with that in non-screened women. 16 Service screening effectiveness will also be influenced by the extent of opportunistic screening. Although data on opportunistic screening are scarce, the increased use of mammography outside organized screening programmes may contribute to a reduction in breast cancer mortality. 17
The emphasis for evaluation has now shifted to population-based screening services, and observational studies will become the main contributors of new information on the impact of breast cancer screening as a public health policy. In this review, we focus on the reduction in breast cancer mortality as the principal benefit of screening, which is by definition a longterm commitment. Several studies corroborate that well-designed observational studies produce results that are similar to those from RCTs. 18 There are, however, specific difficulties in determining the impact of breast cancer screening.
A common first step in the evaluation of screening is to study trends in breast cancer mortality over time. However, the impact of service screening on breast cancer mortality observed in routine population statistics will take many years to emerge. 19 Firstly, with improved treatment, breast cancer survival is generally much higher than in the past while breast cancer incidence has increased in most countries. In combination, the number of deaths in the short-term will be lower, but in the longterm the absolute number of potentially preventable breast cancer deaths has increased. Secondly, it usually takes a number of years before a screening programme is fully implemented. Thirdly, most trend studies are not able to allow for breast cancers diagnosed in women before the start of the screening programme. 20 , 21 Finally, when there is no individual data, no corrections can be made for the varying participation behaviour of women invited. 22 Potential confounding, where factors other than screening may also contribute to changes in breast cancer mortality, also presents a complication. Therefore, service-based screening programmes cannot be evaluated using only analyses of trends.
A further difficulty in determining the impact of screening is the typical absence of a readily available control population. Studies which were able to identify, albeit for a limited time period, a group of contemporaneous controls that were not (yet) invited for screening have mostly used the incidence-based mortality (IBM) approach. IBM studies estimate the impact of screening by calculating mortality rates based on breast cancer deaths occurring in women with breast cancer diagnosed after their first invitation to screening. 23 Using individual data in IBM studies can overcome many of the problems that affect trend analyses.
Case-control (CC), or case-referent, studies have also been used to evaluate the impact of service screening.24–29 A C C study compares breast cancer deaths (cases) with a sample of women who have not died from breast cancer, in terms of individual screening exposure. There is an efficiency gain in taking a sample of the population invited to be screened, rather than observing the entire population. 30 If correctly designed and analysed, the CC approach offers a valid and efficient method for estimating the impact of service screening programmes. 25
Our objective is to assess the impact of population-based screening with mammography on breast cancer mortality in Europe. A best estimate for the effectiveness of population-based screening in Europe will be provided, acknowledging the different methodologies and the limitations of the available data.
Methods
A systematic search of PubMed was performed based on all papers published up to February 2011 (details in the Appendix A). We identified 5011 English-language articles evaluating the effect of mammographic screening on breast cancer mortality in Europe. After inspection of titles and abstracts, 122 studies were considered to be relevant. These were reviewed and further selected using the following criteria: (a) the study represents original data on a population-based screening programme in Europe, (b) breast cancer mortality is reported, (c) the analysis includes at least some of the age groups between 50 and 69, and (d) one of the following observational research designs was used: trend, IBM or CC study. In addition, we only considered studies estimating the impact of current breast cancer screening programmes, and therefore excluded those which had less than three years’ overlap with the relevant current regional or national population screening programme. Based on these criteria, 83 studies were excluded on the following grounds: data from RCTs (n = 17), outcome measure is not breast cancer mortality (n = 20), insufficient overlap with current population-based programme (n = 11), data limited to younger or older women (n = 9), study reporting no new data or no analysis with regard to screening (n = 15), modelling study (n = 6), full paper not in English (n = 2), study on opportunistic screening (n = 2) and study on benign breast disease (n=1).
In addition to the literature search, the Working Group added publications fulfilling the inclusion criteria but not identified by the search and new publications that became available after February 2011 (n = 5). Studies were summarized according to the three designs (see Table 1): trend studies21,31,41,44,46,48,52–56,64,67–71 IBM studies22,32–39,42,45,51,53,57–59,60–63 and CC studies.15,40,43,47,49,50,65,66
Publications on the impact of population-based screening with mammography in Europe according to observational study design
In half of the Swedish counties, the lower age limit is 40; in the other half screening starts at age 50
Current age limits are 50–70 but will be extended to 47–73
Northern Ireland (UK) compared with the republic of Ireland, the Netherlands compared with Belgium/Flanders and Sweden compared with Norway
Trend studies
Relevant papers were those that reported on trends in breast cancer mortality rates in a population as a whole in relation to the introduction and/or extent of population based mammographic screening (n = 17). They are described in detail elsewhere in this supplement. These studies were usually based on aggregated data obtained from routine sources, such as cancer registries. Trend studies were either classified into (a) descriptions of the trend over time in breast cancer mortality in relation to the timing of the introduction of population-based screening (n = 5), or (b) those which included a more detailed analysis with the aim of quantifying the impact of screening on mortality (n = 12). Methods of analysis in the latter category included Poisson regression (with or without age cohort modelling), and the use of joinpoint regression to identify ‘break points’ at which changes in mortality trends occurred (see Table 2). Due to the varied methodology and comparisons in the studies, no attempt was made to produce a pooled estimate of the effect of screening.
Summary of European trend studies that report an estimate of the effect of screening
CI, confidence interval
Northern Ireland (UK) compared with the republic of Ireland, Sweden compared with Norway and the Netherlands compared with Belgium/Flanders
IBM studies
In an IBM study all breast cancer deaths occurring in a dynamic or cohort population over a period of time are enrolled in the study only if the breast cancer diagnosis occurred in a certain time/age window (taking into account eligibility and opportunity to be screened) and the population is classified by screening or by invitation to screening. Thus, for example, breast cancer deaths in the 15 years after screening is initiated in one region, from tumours diagnosed in that 15-year period, may be compared with the corresponding deaths from tumours diagnosed in the same period in a region without screening. The selection of IBM studies contributing to this overall review is described in detail elsewhere in this supplement. 23 There were 20 IBM studies - one each from Denmark, Norway and Spain, two from Italy, seven from Finland and eight from Sweden. A key issue in these studies is how the breast cancer mortality expected in the absence of screening is estimated. Another methodological concern is how the study deals with potential biases in the estimated mortality reduction due to screening. Because breast cancer cases are diagnosed earlier in screened women than in those who are not screened, a longer follow-up period for breast cancer deaths than the accrual period for cases will confer an artificial increase in mortality in the screening period due to fatal cases whose diagnosis is moved to the accrual period due to lead time. The same consideration applies to age at diagnosis. If mortality includes deaths from tumours diagnosed within a certain age range, but with no upper limit on age at death, there will be a number of fatal cancers diagnosed by screening within the age range, which would otherwise have been excluded as diagnosed symptomatically above the age range. 23
Table 3 presents some basic characteristics of the IBM studies. Where there was overlapping data, the study used in this review was selected on the basis of follow-up time, judgement of quality of the comparison group and study size. We calculated a pooled estimate of the effect on breast cancer mortality in women invited versus not invited, as well as a pooled estimate for women screened versus not screened, using the formula described by Duffy et al. 72 The effect sizes were pooled using the inverse variance method (random effects model) and heterogeneity between the studies was assessed. 14 , 73
Design characteristics of European IBM studies, excluding those with overlapping data, and estimate of effect
NR, not required; NA, not adjusted; IBM, incidence-based mortality; CI, confidence interval
CC studies
A CC study is embedded in a cohort or a dynamic population and based on sampling of the population experience. Breast cancer deaths (cases) in the population are collected over the period of interest and controls who have not died of breast cancer are selected from the same population, often closely matched by temporal factors. Breast cancer cases and control subjects are then compared with respect to screening history before the date of diagnosis of the breast cancer case. The eight CC studies used in this review (Table 4) came from a recently published methodological overview, but we excluded non-European studies 26 and added publications by Broeders et al., 50 van Schoor et al. 15 and Otto et al. 47
Design characteristics of European case-control studies, and estimate of effect
Based on original publication, except for van Schoor (personal communication) and Broeders (personal communication)
Calculated using the formula by Duffy et al. (Appl Stat 2002) and using the crude OR
Index invitation is the most recent invitation before diagnosis of the breast cancer case
§And in NETB Report XII
Limited to cancers diagnosed in 1995–2001, where the crude OR was 0.49 (0.36–0.66)
Based on an overall crude OR of 0.64 (0.44–0.92) (Broeders, personal communication) and self-selection factor of 1.08, 95% CI 0.85- 1.37 (Paap et al. 87 )
The results were pooled to obtain estimates of the effect on breast cancer mortality for women screened versus not screened, based on the crude odds ratios (ORs) as well as ORs adjusted for self-selection. In addition, intention to treat estimates were calculated, using the formula described by Duffy et al., 72 in order to compare the women invited with those not invited. Because the studies by Broeders et al. and van Schoor et al. were both conducted in Nijmegen, with overlap in the included cases, the former was excluded from the meta-analysis. The effect sizes were pooled as above. 14 , 73
Breast cancer mortality as an outcome measure
Breast cancer mortality is the most appropriate primary endpoint for evaluating screening, although its use has been questioned. 74 , 75 An outcome parameter which avoids problems with cause of death classification is (refined) excess mortality from breast cancer, which includes all mortality associated with breast cancer, even indirectly caused deaths, such as treatment-induced mortality, or deaths caused by the stress imposed by the cancer. 76 However, this method, so far, has only been used in Sweden.
Potential limitations of using breast cancer mortality as an outcome measure are that there could be an increase in deaths attributed to breast cancer because more breast cancer cases are diagnosed in screened women, and the misclassification of breast cancer as the underlying cause of death because the treating physician is influenced by the screening history of the patient. Screening may also affect mortality from other causes, for example, due to complications arising from procedures triggered by screening. 75 However, several studies explicitly assessed the quality of cause-of-death determination in relation to mammographic screening and found no significant evidence of bias.77–80
Results
Trend studies
Of the 12 trend studies, three used joinpoint regression, and nine Poisson regression (Table 2). Five papers were based on all of an individual country (England, the Netherlands and Spain), two studied the programme in the city of Florence (Italy), two studied different regions in Spain and one studied two regions of Denmark. One paper included Northern Ireland, the Netherlands and Sweden in comparison with the Republic of Ireland, Belgium/Flanders and Norway, respectively. The most recent paper studied nine counties in Sweden.
Authors of several studies estimated the annual percentage change in mortality, while others presented a comparison between two distinct time periods. Of the former, estimates ranged from reductions of 1% to 9% per year; for those studies with adequate follow-up (at least 10 years from the date of full coverage by invitation) the estimates were 1%, 2.3–2.8% and 9%.31,46,48,52–55 Of the three studies comparing time periods within a single country, all had adequate follow-up, and the estimates of mortality reduction compared with a prescreening period ranged from 28% to 36%.41,53,64
IBM studies
Table 3 shows the design characteristics of the IBM studies. The outcomes were generally compatible when differences in methodology and local circumstances were taken into account. Details are given elsewhere in this supplement.
23
Those with the strongest designs had (a) expected breast cancer mortality estimated from a cohort of women not yet invited
39
or from historical and contemporaneous control groups;
32
,
36
and (b) an accrual period equal to the follow-up period for breast cancer deaths.
23
Using all IBM studies, excluding overlapping data-sets, produced a pooled relative risk (RR) estimate of 0.75 (95% confidence interval [CI] 0.69–0.81) for invitation to screening, with no significant heterogeneity (P = 0.23). The combined RR for women actually screened was 0.62 (95% CI 0.56–0.69), again with no significant heterogeneity (P = 0.40). Figure 1 shows the forest plots.
Incidence-based mortality studies excluding overlapping data: (a) estimates for breast cancer mortality reduction in women invited versus not invited; (b) estimates for breast cancer mortality reduction in women screened versus not screened. ITT = intention to treat; PP = per protocol
CC studies
Of the eight CC studies included, one came from Iceland, one from Italy, four from the Netherlands and two from the UK (Table 4), but their designs were very similar. 26 The definition of exposure to screening was based on a comparison of women ‘ever’ screened versus women ‘never’ screened in four studies. All Dutch studies adopted the concept of the index invitation, defined as the invitation date closest to the date of diagnosis of the case. The comparison in these studies was between women screened in an exposure period which varied from one to three screening examinations versus women not screened in this period. All studies reported ORs adjusted for self-selection bias, either using the correction factor estimated by Duffy et al. 72 or their own correction factor, all closer to 1 than the Duffy factor. Based on the results in the original publications, we also calculated the reduction in breast cancer mortality for women invited versus not invited. 72
Seven CC studies were included in a pooled analysis (see Methods). The combined unadjusted OR was 0.46 (95% CI 0.40–0.54), a significant 54% reduction in breast cancer mortality for screened versus not screened women. This became a 48% reduction after adjusting for self-selection (OR 0.52, 95% CI 0.42–0.65). There was no evidence of heterogeneity in either analysis (P = 0.10 and 0.17, respectively). The combined mortality reduction for invitation to screening was 31% (OR 0.69, 95% CI 0.57–0.83), but with significant heterogeneity (P = 0.005). Figure 2 shows the forest plots. The squares representing the point estimates in the individual CC studies are proportional to the precisions of the log ORs. The order of these may vary when adjusted for self-selection bias as after adjustment the precision also depends on the standard error of the self-selection correction. This in turn depends on the participation rate in each study.
Case-control studies excluding overlapping data: (a) crude odds ratios for breast cancer mortality reduction in women screened versus not screened; (b) crude odds ratios for breast cancer mortality reduction, corrected for self-selection, in women screened versus not screened; (c) crude odds ratios for breast cancer mortality reduction translated to intention to treat estimates for women invited versus not invited
Discussion
Our overview indicates that the estimates from observational studies, using different study designs, are consistent with a breast cancer mortality reduction of 25–31% for women in Europe invited for population-based screening. The current best estimate of the effectiveness of European screening programmes is therefore at least as large as that observed in the longterm follow-up of the Swedish RCTs 81 or more recent meta-analyses. 74 , 82
Given the methodological limitations inherent in observational studies, and the differences in designs, the similarity in the effect estimates from trend, IBM and CC studies is noteworthy. Using all IBM studies without overlapping data, the reduction in breast cancer mortality for women invited was 25%. The corresponding intention to treat estimate in the CC studies was 31%. The relative reduction in breast cancer mortality for women who actually participated in screening was 38% based on IBM studies and 48% based on CC studies. Of the three trend studies comparing time periods within a single country, all had adequate follow-up, and the estimates of mortality reduction compared with a prescreening period ranged from 28% to 36%.
The choice of IBM studies to include in the case of overlapping data was not crucial to the estimated mortality reduction, because pooling all studies, including those with overlapping data, gave a mortality reduction of 24%, and selection of three studies on the basis of both historical and contemporaneous comparison groups gave a reduction of 26%. 23 The heterogeneity among studies of the intention to treat estimate from the CC studies is likely to be due to differing uptake rates between studies, because there was no significant heterogeneity when the effect of actually being screened was assessed.
The study and analysis of population breast cancer mortality rates can be a first step in evaluating the impact of screening on mortality. However, such analyses should be restricted to the age ranges likely to demonstrate a benefit from screening; they should attempt to exclude time periods where dilution due to deaths in women diagnosed preinvitation will be evident; and they should attempt to take account of past underlying trends. We do not support the recommendation of Harris et al. 83 to focus on a trend or ecological approach.
The most valid observational designs are those where longitudinal individual data are available, directly linking screening history to the cause of death, achieved using either an IBM or a CC approach. IBM studies and CC studies have one major feature in common - they typically take as clinical endpoint deaths from cancers which have been diagnosed in the age range and time period in which screening is offered. This avoids dilution bias associated with deaths from breast cancers in a given period from tumours diagnosed before that period began. 62 The most obvious difference between the two is that the CC study is retrospective and the IBM study prospective.
In the CC study, data on deaths from the cancer in question are collected along with that from subjects who have not died of the disease, and screening histories retrieved retrospectively. There are a number of well-known potential biases associated with this design, some conservative and some anticonservative. 24 , 84 However, these can be minimized by appropriate design or corrected for in the statistical analysis. 25 , 85 Some biases, such as residual confounding after adjusting for age, tend to be very small. 86
Typically in the IBM studies, rates of death from cancers diagnosed in a population and period of invitation to screening are compared with the corresponding rates in a population or period without such invitation. 59 This too has potential biases. There is likely to be confounding of some variables between populations and periods if individual data on invitation and screening are not available. For example, if a before-after comparison of IBM is carried out, the time cut-off will inevitably incur some misclassification of exposure to invitation, because screening is usually phased in over a period of years. 86 In the CC approach, individual screening histories are retrieved so there is no misclassification of exposure. 25
In principle, screening exposure can be ascertained for all subjects in the population in the IBM approach, but this involves retrieval of data on tens or even hundreds of thousands of subjects, whereas the CC design typically involves much smaller numbers. 15 Therefore, the CC approach is a more economic research strategy, even though it may involve more complex design or analytic procedures. However, if exposure to screening is ascertained for all study subjects on an individual basis in both study designs, the intention-to-treat estimate from CC studies should be similar to that from the IBM studies, as indeed is observed in this review.
CONCLUSION
After considering all published data from European studies, the reduction in breast cancer mortality associated with mammographic population-based service screening programmes is in the range of 25–31% for women invited for screening and 38–48% for women actually screened with sufficient follow-up time. It appears that much of the current controversy surrounding the value of mammography screening is due to the use of inappropriate methodological approaches that are unable to capture the true effect of mammographic screening.
EUROSCREEN WORKING GROUP
Coordinators:
Members:
Ancelle-Park R (F) 1 , Armaroli P (I) 2 , Ascunce N (E) 3 , Bisanti, L (I) 4 , Bellisario C (I) 2 , Broeders M (NL) 5 , Cogo C (I) 6 , De Koning H (NL) 7 , Duffy S W (UK) 8 , Frigerio A (I) 2 , Giordano L (I) 2 , Hofvind S (N) 9 , Jonsson H (S) 10 , Lynge E (DK) 11 , Massat N (UK) 8 , Miccinesi G (I) 12 , Moss S (UK) 8 , Naldoni C (I) 13 , Njor S (DK) 11 , Nystrom L (S) 14 , Paap E (NL) 5 , Paci E (I) 12 , Patnick J (UK) 15 , Ponti A (I) 2 , Puliti D (I) 12 , Segnan N (I) 2 , Von Karsa L (D) 16 ,Törnberg S (S) 17 , Zappa M (I) 12 , Zorzi M (I) 6
Affiliations:
1 Ministère du travail de l'emploi et de la santé, Paris, France
2 CPO-Piedmont, Turin, Italy
3 Navarra Breast Cancer Screening Programme. Pamplona, Spain
4 S.C. Epidemiologia, ASL di Milano, Italy
5 Radboud University Nijmegen Medical Centre & National Expert and Training Centre for Breast Cancer Screening, Nijmegen, The Netherlands
6 Veneto Tumor Registry, Padua, Italy
7 Erasmus MC, Dept. of Public Health, Rotterdam, The Netherlands
8 Wolfson Institute of Preventive Medicine, Queen Mary University of London, London, UK
9 Cancer Registry of Norway, Research Department and Oslo and Akershus University College of Applied Science, Oslo, Norway
10 Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
11 Centre for Epidemiology and Screening, Universtiy of Copenhagen, Copenhagen, Denmark
12 ISPO Cancer Research and Prevention Institute, Florence, Italy
13 Regional Cancer Screening Center, Emilia-Romagna Region, Bologna, Italy
14 Department of Public Health and Clinical Medicine, Division of Epidemiology and Global Health, Umeå University, Urneå, Sweden
15 NHS Cancer Screening Programmes and Oxford University, UK
16 International Agency for Research on Cancer, Lyon, France
17 Stockholm Cancer Screening, Stockholm, Sweden
Footnotes
Acknowledgements
Financial support was provided by the National Monitoring Italian Centre (ONS) to host the EUROSCREEN meetings in Florence in November 2010 and in March 2011 and the supplement publication and the National Expert and Training Centre for Breast Cancer Screening, Nijmegen, The Netherlands to host a meeting of the EUROSCREEN mortality working group in July 2011.
