Abstract
Background
The estimation of breast cancer screening sensitivity is a major aim in the quality assessment of screening programmes. The proportional incidence method for the estimation of the sensitivity of breast cancer screening programmes is rarely used to estimate the underlying incidence rates.
Methods
We present a method to estimate episode and programme sensitivity of screening programmes, based solely on cancers detected within screening cycles (excluding breast cancer cases at prevalent screening round) and on the number of incident cases in the total target population (steady state). The assumptions, strengths and limitations of the method are discussed. An example of calculation of episode and programme sensitivities is given, on the basis of the data from the IMPACT study, a large observational study of breast cancer screening programmes in Italy.
Results
The programme sensitivity from the fifth year of screening onwards ranged between 41% and 48% of the total number of cases in the target population. At steady state episode sensitivity was 0.70, with a trend across age groups, with lowest values in women aged 50-54 years (0.52) and highest in those 65-69 (0.77).
Conclusions
The method is a very serviceable tool for estimating sensitivity in service screening programmes, and the results are comparable with those of other methods of estimation.
Introduction
Secondly, it is becoming increasingly difficult to produce reliable incidence rates ‘in the absence of screening’. The areas without a screening programme, where incidence rates could be estimated utilizing neighbouring screening programmes, are progressively reducing. Moreover, in the areas with long-lasting screening programmes the estimates of incidence rates based on trends built on the periods before the onset of screening are decreasingly reliable. Lastly, mammography is now widely used, even in the absence of organized screening programmes.
Programme sensitivity is the proportion of screen-detected cases within the total number of cases incident in the target population during a specific period of time. 1 Estimation of this sensitivity might be biased due to length and/or overdiagnosis bias and conditioned by (1) population coverage (not all the target population is invited for screening); (2) non-responders (not all invitees attend screening); and (3) performance (the screening test and/or the assessment may be poorly performed).
We here present a methodology and an example of calculation of sensitivity based solely on cancers detected within screening cycles (excluding breast cancer cases at prevalent screening round) and on the number of incident cases in the total target population, as reported by cancer registries.
Methods
In order to measure programme sensitivity we adopted the principle of the steady state behaviour of a screening programme, and related formulas, first described by Eddy 5 in 1980.
The sensitivity estimate is based not on the screened cohort but on a ‘stable’ population, i.e. one in which the disease incidence rate and the distributions for the mammogram intervals do not change with time or with the patient's age. We consider a periodic screening programme 6 as one in which the test is offered in a repetitive pattern, and the shortest block of time, the period, is such that the number and timing of the test delivered is the same for each block. Additionally, we consider the yield of new cases found during one screening programme period to be the proportion of people in the screened population who are first discovered, by whatever means, in the period to have the disease (where means could be by a test at the screening session or by the patient in an interval).
The principle of the steady state behaviour of a screening programme is that, in a stable population, the expected yield of new cases discovered by whatever means in each period of a periodic screening programme is constant after the initial period (i.e. the first screening round). Furthermore, that yield is approximately equal to the incidence rate multiplied by the length of the period. 5 This can be proved mathematically, 5 but here we give a simplified graphical demonstration of the proof in the Appendix.
In general terms the consequence is that the expected number of cases in the absence of screening is equivalent to the sum of screen-detected in steady state, Interval cancer and Others modalities of detection. The equivalence is satisfied only in steady state, i.e. about 3-4 years since the start of the screening programme.
The estimate of programme sensitivity in this steady state population, i.e. excluding the first screening, is given by
The episode sensitivity estimate in steady state is
Comparing episode and programme sensitivity formulas, it appears that the episode sensitivity is primarily the estimated measure of the average sensitivity for the individual woman, whereas the programme sensitivity is a characteristic of the studied dynamic population that indicates the proportion of breast cancer cases which were detected at screening.
The estimate of sensitivity is then possible in areas where a cancer registry collects all incident breast cancer cases in the whole population. All breast cancer cases should be classified by method of diagnosis, i.e. as screen-detected at first or repeated screening test and at least as non-screen-detected. Moreover, if SD and IC cases are classified by screening round, episode sensitivity can be estimated by screening cycle allowing monitoring of changes in sensitivity with time.
Application to the IMPACT Study data
The IMPACT Study includes breast cancers diagnosed between 1988 and 2006 in women aged 40-79 years who were resident in 21 Italian areas. The characteristics of both the breast cancer screening programmes and main performance indicators have been described in detail. 7
Breast cancers were included in accordance with the International Agency for Research on Cancer rules for cancer registration. 8 In situ carcinomas were included, but death certificate only cases and multiple primaries were excluded.
All registry-based breast cancer cases were linked to the screening file and divided up by detection method. We classified cases as either screen-detected at the first screening test, at a later screening test or not screen-detected, defined as Others. The latter included cases diagnosed among the never responders, not-yet-invited, as well as the cases diagnosed clinically outside the screening process following a negative screening test (i.e. interval cancer cases diagnosed within the 2-year interval and irregular attendees).
We estimated expected cases in the absence of screening by modelling a pooled annual trend of incidence in the prescreening period to predict rates for each area using a multistep process described in detail elsewhere. 9 Statistical significance of the observed to expected ratio (O/E) was assessed through 95% confidence intervals (CIs) calculated using Byar's approximation. 10
Programme and episode sensitivity were calculated for invasive only and invasive + in situ cases by age class (50-54, 55-59, 60-64 and 65-69 years).
Results
Areas with at least seven-year-old screening programmes were included in this analysis (n = 9) with a total of 12,935 cases, of which 10.5% were in situ (n = 1354) (Table 1). Of these, 29.4% were screen-detected at repeated test (SD_L) and 17.4% were diagnosed outside the screening programme after a negative screening test (n = 2251). In total, 71.7% of the latter had been detected up to 730 days after the last negative mammography (IC), and 28.3% after a longer interval.
IMPACT study
Figure 1 shows the O/E ratio (with 95% CIs) of invasive and in situ cases in the first eight years after screening start, which is near 1 in steady state.
IMPACT study. Observed/expected invasive and in situ cases by years of screening, with 95% CI
Table 2 shows the trend in cumulative episode sensitivity and annual programme sensitivity estimates from year 3 to year 8 of screening activity.
IMPACT study
The programme sensitivity at steady state (from the fifth year of screening onward) ranged between 41% and 48% of the total number of cases in the target population.
Episode sensitivity progressively increased with time and reached a plateau of 0.70 at eight years after screening start. Sensitivity for invasive cases alone was slightly lower (0.68 - data not shown).
Episode sensitivity showed a clear trend across age groups, with lowest values in women aged 50-54 years (0.52 at year 8) and highest in those 65-69 (0.77) (Figure 2).
IMPACT study. Trends in episode cumulative sensitivity estimates from year 3 to year 8 of screening activity, by age class (invasive and in situ cases)
Discussion
We here present a method to estimate the sensitivity of screening programmes applied to a target screening population in steady state. This method represents a very serviceable tool to estimate the sensitivity of service screening programmes, because it does not depend on cohorts of screened women and hard-to-estimate data such as the underlying incidence trends (in the absence of screening).
The use of the steady state ratio screen-detected/(screen-detected + interval cases) 11 also largely overcomes the biases that affect the traditional method to calculate sensitivity. Slow-growing cancers (length bias), which would surface as symptomatic beyond the planned rescreening interval, or which would never become symptomatic during a lifetime (overdiagnosis), are likely to be included among cases screen-detected at prevalence screening, and so are excluded from this estimate. The results from the randomized clinical trials, reviewed by Sue Moss 12 in 2005, showed that in screening trials where the control group was screened at the end of the study period, the excess of incidence - and possible overdiagnosis - was practically absent.
Some of the assumptions of the model have been partially violated. Firstly, the model assumes a stable population in which the disease incidence rate and the screening intervals do not change with time or with the patient's age. However, in practice, incidence rates are increasing independently of the effect of service screening programmes. The yield of cases detected at a new screening round exceeds the expected number of cases, since the number of cases whose diagnosis is anticipated is greater than those stolen by the previous test. However, it is plausible that this only marginally affects the estimates. We carried out a sensitivity analysis including the 1.7% annual increase of breast cancer incidence in Italy 8 and obtained a 0.5% decrease of the sensitivity estimate.
Additionally, because breast cancer incidence increases with age and SD_L cases are diagnosed at the end of the screening cycle, the sensitivity estimate will be artificially increased. It is plausible, though, that this effect is limited.
Secondly, the model assumes a fixed period of the screening cycle. However, there is a caveat, for if the period is much longer than two years, later screening will tend to resemble a prevalence screen and there will be an increase of the numerator in the sensitivity formula. The variation in screening interval within the IMPACT study data resulted in small effect in sensitivity. To estimate the episode sensitivity of a programme given the length of its period, we carried out a sensitivity analysis utilizing the IMPACT study database. We compared the observed sensitivity with that estimated by adding to the ICs the cases emerging at different intervals from the previous negative mammogram. When cases diagnosed up to 27 months were considered, sensitivity at year 8 after screening start was reduced by 2.4-0.68%. The sensitivity dropped by 3.7-0.67% when including cases diagnosed up to 30 months since negative screening. Of course this small effect is related to episodic variations in the interscreening interval, while systematic differences of the interval are expected to be adequately evidenced by the method.
Thirdly, the model assumes 100% compliance to later screening tests. If compliance is suboptimal, the number of detected SD_L cases will be lower than expected and, as a consequence, sensitivities will be underestimated. Under the assumption that the detection rates in women attending later screening tests and in non-attenders would be similar, we estimated the effect of non-attendance and reduction in sensitivity. A direct method to overcome this problem is to divide the SD by the estimate of attendance rate to second screen.
It should be pointed out that ICs diagnosed in women aged 70+ years must be excluded from the estimate, because they belong to an open-ended set, which is not closed with a subsequent screening test. Since sensitivity is greatest in the elderly, this leads to an underestimate of the overall sensitivity.
A proportion of ICs for some reason is not detected during the interval but is diagnosed at the subsequent screening episode. Our method does consider these cases as ‘successes’ and thus sensitivity is overestimated. To quantify this effect we reclassified the screen-detected cases that were pT3+ at diagnosis as interval cancers and observed a 4% reduction in sensitivity to 0.67 after eight years of screening.
It should also be recognized that the opposite may take place: some asymptomatic cancers are detected in women who spontaneously undergo a mammography during the interval. These cancers are labelled as failures (interval) instead of successes (detection at screening). The size of this effect is directly associated with the spread of interscreening examinations, that can be highly variable from place to place. In the IMPACT Study areas, we recoded the interval cases pTlb or less as screen-detected and obtained an 8.6% increase of sensitivity up to 0.76 at the eighth year of screening. This distortion clearly affects all methods to estimate sensitivity.
According to our estimates, the screening programmes included in the IMPACT study reported an overall sensitivity of 0.70 after eight years from screening start. This result is in accordance with the standard given in European Guidelines for interval cancer rate as a proportion of the background incidence rate (30% in the first 11 months of interval and 50% in months 12-23; onaverage,40%), 13 and it is in agreement with many estimates in Italian programmes produced using the proportional incidence method, 14 18 although not with all. 19
Several papers have been published on sensitivity of breast cancer screening programmes in other countries, using both the proportional incidence method 20 31 and other models. 32 34 As shown in a recent paper by Törnberg et al. 35 who reported the results of programmes from six European countries, the comparison is very difficult because programmes have different characteristics (age group targeted, number of views, participation rates, etc.) that may deeply affect their sensitivity.
We observed a reduction in sensitivity with decreasing age as reported in many different studies.17,36–42 This trend has been attributed mostly to the higher proportion of dense breasts in younger women.43,44
In conclusion, the method produced reliable estimates of programme and episode sensitivity in steady state populations, i.e. when a service screening programme is mature after the first enrolment and prevalence screening period. The estimate is possible using cancer registry data and knowledge of diagnostic modality. Better knowledge of the contribution of the diagnostic true false-negative interval cancer cases in the estimate of sensitivity is possible if all breast cancer cases in the target population are classified by diagnostic modality in relation to screening.
IMPACT working group
E Paci, P Falini, D Puliti, I Esposito, M Zappa, E Crocetti (Clinical and Descriptive Epidemiology Unit - ISPO - Cancer Prevention and Research Institute, Firenze); S Ciatto (Department of Diagnostic Imaging - ISPO - Cancer Prevention and Research Institute, Firenze); C Naldoni, AC Finarelli, P Sassoli de Bianchi (Screening Programme, Department of Health, Regione Emilia-Romagna, Bologna); S Ferretti (Ferrara Cancer Registry, Dipartimento di Medicina Sperimentale e Diagnostica, Sezione di Anatomia, Istologia e Citologia Patologica, Università di Ferrara, Ferrara); GP Baraldi (Breast Cancer Screening Programme, Ferrara); M Federico, C Cirilli (Modena Cancer Registry, Modena); R Negri (AUSL, Modena); V De Lisi, P Sgargi (Parma Cancer Registry, Parma); A Traina, M Zarcone (Department of Oncology, ARNAS Ascoli, Palermo); A Cattani, N Borciani (AUSL, Reggio Emilia); L Mangone (Registro Tumori di Reggio Emilia, Dipartimento di Sanità Pubblica, AUSL, Reggio Emilia); F Falcini, A Ravaioli, R Vattiato, A Colamartini (Romagna Cancer Registry, IRST, Forli); M Serafini, B Vitali, P Bravetti (AUSL, Ravenna); F Desiderio, D Canuti, C Fabbri (AUSL, Rimini); A Bondi, C Imolesi (AUSL, Cesena); N Collina, P Baldazzi, M Manfredi, V Perlangeli, C Petrucci, G Saguatti (AUSL, Bologna); N Segnan, A Ponti, G Del Mastro, C Senore, A Frigerio, S Pitarella (CPO Piemonte, AO San Giovanni Battista, Torino); S Patriarca, R Zanetti (Registro Tumori Piemonte, CPO Piemonte, AO San Giovanni Battista, Torino); M Vettorazzi, M Zorzi (Venetian Tumour Registry, Istituto Oncologico Veneto - IOV IRCCS, Padova); A Molino, A Mercanti (Università di Verona, Verona); F Caumo (Azienda ULSS, Verona); R Tumino, A Sigona (Registro Tumori, UO Anatomia Patologica, Azienda Ospedaliera Civile MP Arezzo, Ragusa); G La Perna, C Iacono, ONCOIBLA e UO di Oncologia, AO Ragusa, Ragusa); F Stracci, F La Rosa (Registro Tumori Umbro, Perugia); M Petrella, I FuscoMoffa (Epidemiology Unit, ASL2, Perugia).
Footnotes
None declared. This study was supported in part by research grants from the Italian Ministry of Health, Regione Abruzzo, and the Italian League against Cancer, Rome. The sponsors had no role in the collection, analysis and interpretation of data, in the writing of this report, and in the decision to submit the manuscript for publication.
Acknowledgment
Thanks to Nigel Barton for the English revision of the draft.
Graphical Demonstration of the Principle of the Steady-State Behaviour of a Screening Programme
Consider a population of n women at a single moment of time. Breast cancers in a preclinical phase will be present and unknown (shaded area in Figure A1). These cancers are at different states of development or growth - some being very immature, and others very mature. We can imagine the cancers arrayed as in Figure A1 according to their states. If the subgroup of women affected by a cancer simultaneously undergoes a diagnostic test (e.g. a mammogram), the probability that the test will be positive depends on the state of development of that cancer.
Principle of the steady-state behaviour. Graphical demonstration
It is reasonable to assume that the more mature the cancer, the higher the probability. In other words, this probability is a function of the state of development of the cancer and is represented in Figure A1 by the black curve (the ‘detection probability curve’). Now let us put the state of development on a timescale and pick an arbitrary reference point in the life of the lesion (t = 0) corresponding to the moment of incidence, defined as the moment the patient would seek care on her own, and draw the timescale before that point (Figure A2).
Let the annual incidence rate of cancers be R per n women years: in our population there are R whose cancers are in a development state between -1 and 0, and R more with cancers in states between -2 and -1, and so forth. In the absence of screening, the cancers that will be diagnosed in the following two years are those whose state is between -2 and 0 (shaded area in Figure A2).
Suppose that we screen these women with a mammogram and let the screening interval be two years. The light blue area under the detection probability curve in Figure A3 represents the number of cases we expect to detect. Screening is imperfect, so there are a number of cancers that are not picked up by mammography, but that will come to clinical attention in the following two years (the dark blue area in Figure A3). They are the interval cancers of the first screening round.
At the time of the first re-examination two years have passed and all of the women in Figure A3 have moved along the horizontal axis two units to the right (Figure A4). The results of the second mammogram are different, because some of the cancers in this population have already been diagnosed by the previous screening episode (labelled as ‘stolen’ in Figure A5). The light green area in Figure A5 represents the cases screen-detected at the second episode. The dark green area represents the interval cases of the second screening round.
We want now to show that the periodic yield of cancers detected by all methods after the initial examination is approximately equal to the biennial incidence rate. We can achieve this by showing that the sum of the areas in Figure A6 shaded light and dark green (respectively, cancers detected in the first re-examination and in the interval) equals the area inside the red rectangle based on the interval - 2 to 0, which is the biennial incidence rate. As the two areas labelled as X and Y are common (Figure A7), and the equivalence of those labelled Z and W is shown in Figure A8 (where the two yellow areas are equal, and, so too, are the violet areas) the demonstration is complete.
