Abstract
Objectives
We investigated whether changes in mammographic technique and screening policy have improved mammographic sensitivity, and elongated the mean sojourn time, since the introduction of biennial breast cancer screening in Nijmegen, the Netherlands, in 1975.
Methods
Maximum likelihood estimation, non-linear regression, and Markov Chain Monte Carlo simulation were used to estimate test sensitivity, mean sojourn time, and underlying breast cancer incidence in four time periods, covering 40 years of breast cancer screening in Nijmegen (1975–2012).
Results
Maximum likelihood estimation generated an estimated test sensitivity of approximately 90% and a mean sojourn time around three years, while the estimates based on non-linear regression and Markov Chain Monte Carlo simulation were 80% and four years, respectively. All three methods estimated a rise in the underlying breast cancer incidence over time, with approximately one case more per 1000 women per year in the final period compared with the first period.
Conclusions
The three methods showed a slightly higher mammographic sensitivity and a longer mean sojourn time in the last period, after the introduction of digital mammography. Estimates were more realistic for the more sophisticated methods, non-linear regression and Markov Chain Monte Carlo simulation, while the simple closed form approximation of maximum likelihood estimation led to rather high estimates for sensitivity in the early periods.
Introduction
Analyses of the national breast cancer screening program in the Netherlands have shown that breast cancer can be detected early and that breast cancer mortality has decreased since the introduction of screening, while estimates of overdiagnosis are considered acceptable. 1 The start of the Dutch breast cancer screening program in 1989 was preceded by pilot studies in Utrecht and Nijmegen.2,3 Since the introduction of biennial breast cancer screening in Nijmegen in 1975, the mammographic technique has improved, and has recently changed from analogue to digital. The hypothesis is that these technological advances, as well as other changes within the program (e.g. to increase the referral rate) led to an improved test sensitivity of mammography, and an elongation of the mean sojourn time. Sojourn time can be defined as the duration of the preclinical screen-detectable phase, i.e. the period during which a woman is asymptomatic but the breast cancer is detectable by mammography. 4 Test sensitivity is the probability that a woman with asymptomatic breast cancer undergoing mammographic screening during the preclinical detectable phase will have her breast cancer detected by the test. 4 As the effectiveness of breast cancer screening crucially depends on test sensitivity and mean sojourn time, these measures were estimated when the breast cancer screening program was introduced, but the estimates have not been updated following the technological advances in mammography and policy changes in the program. 5
The challenge of quantifying mean sojourn time is that it is not directly observable. It can, however, be estimated. The easiest method is to use tumour volume doubling times as a proxy, as mean sojourn time can be seen as a measure of tumour growth.6,7 Several methods, some simple and some complex, have been developed for the estimation of mean sojourn time based on screening frequency data and breast cancer prevalence and incidence data, which are easily obtainable. Methods described in the current literature include simple methods based on the ratio of prevalence of the disease at first screening examination related to the expected annual incidence rate, 8 parametric models assuming a specific distribution for sojourn time,9,10 nonparametric methods with time split into discrete intervals, 11 simulation models with patient-level data, 12 and Markov models simulating the natural history of chronic diseases. 13 As there is no gold standard to estimate mean sojourn time, the reliability of all these estimation methods remains unclear. We therefore applied three different estimation methods, to investigate whether they gave similar results which can still be incorrect. As our interest was in measuring the effect of technological advances in mammography and changes in policy over several decades, we wanted to obtain multiple estimates for test sensitivity and mean sojourn time. The most relevant change is the shift from analogue to digital mammography in 2007, and its accompanying higher breast cancer detection rate. 14
In this paper, we investigate whether technological advances in mammography and changes in screening policy have improved the test sensitivity of mammography and elongated the correlated mean sojourn time. We illustrate this by applying three different estimation methods, using almost 40 years of data from the Nijmegen breast cancer screening program.
Methods
A pilot study on biennial breast cancer screening with analogue mammography started in 1975 in the city of Nijmegen, the Netherlands. 15 After promising results from this pilot, and another in Utrecht, the Dutch government decided to implement nationwide biennial breast cancer screening,2,3,16 and from 1989 the Nijmegen program became part of the national program. The Dutch government was responsible for the execution and quality control of the program and the training of the radiologists and radiographers. After the implementation period of the national program, the referral rate was 10 per 1000 women screened, with a somewhat disappointing breast cancer detection rate. The result triggered an investigation into the optimal referral rate in the early 2000s. 17 Based on this ‘optimization study,’ the Dutch Expert Centre for Screening recommended an increase in the referral rate to 20 per 1000 women screened. In 2007, the breast cancer screening program in Nijmegen switched from analogue to digital mammography. 18 This further increased the referral rate to around 25 per 1000 women screened. 19
During the 40 years of breast cancer screening in Nijmegen, women in varying age ranges were invited, but women aged 50–69 were consistently invited. For these women, the following data were collected: invitation for screening, participation in screening, referral for further diagnostic work-up, and diagnosis of a screen-detected cancer or interval cancer (a cancer detected between two consecutive screening rounds). These data were used to estimate test sensitivity of mammography and mean sojourn time. To investigate the effect of changes in mammographic technique and screening policy on these two parameters, the data of the Nijmegen breast cancer screening program (1975–2012) were grouped in four periods: (1) pilot study in Nijmegen (1975–1988); (2) introduction of nationwide breast screening program (1989–2000); (3) publication of study on increasing the Dutch referral rate (2001–2006); and (4) introduction of digital mammography (2007–2012). For each of these periods we obtained estimates of test sensitivity, mean sojourn time, and the underlying breast cancer incidence. Because breast cancer screening in the Netherlands was implemented more than 25 years ago (1989), there was no suitable control group or other reliable estimate of the underlying breast cancer incidence available.
We here describe three methods for estimating test sensitivity of mammography and mean sojourn time based on modeling routine screening outcome data.
Method 1: based on maximum likelihood estimation
Test sensitivity, mean sojourn time, and underlying incidence were estimated based on empirical screening data. To determine the test sensitivity of mammography we calculated the ratio of the number of screen-detected cancers to the number of interval cancers diagnosed in the first year after screening, plus all screen-detected cancers. The assumption was made that interval cancers detected in the first year after screening were missed cancers from the previous screen.
20
Let λ1 be the underlying incidence of preclinical disease, λ2 be the rate of disease progression from preclinical to clinical phase, and S the screening test sensitivity (of mammography). Let t be the interval between screens in years. At first screen, the expected proportion of persons found to have cancer would be
At incident screens, the expected proportion is
21
The first term within the brackets is the probability of cancers newly arising in the preclinical phase since the last screen and not progressing to a clinical disease before the next screen. The second term is the probability of preclinical cancers missed at the previous screen which have not progressed to the clinical phase within the screening interval. There are two simplifying approximations here. The first is that the first term of the formula applies if the previous screen was also a subsequent (not a first) screen. Literature has shown that interval cancer rates after a subsequent screen are similar to interval cancer rates after a first screen. 22 Applying this universally, regardless of the status of the previous screen, is therefore arguably a reasonable approximation. The second term is the absence of terms for cancers missed at screens before the last screen. Here, we assume that if a cancer is missed at a screen, it will progress to a clinical disease in the subsequent interval, be detected at the subsequent screen, or progress to a clinical disease in the interval following the subsequent screen. This is an approximation; however, we assume that the probability of missed cancer at two successive screens is relatively small. 23
The above formula with calculus then solves to
The expected proportion of screen-negative subjects having a clinical interval cancer before the next screen (making the same approximations as for subsequent screens) is
Given these probabilities based on the three formulae for P, I, and C, specific formulae for the log-likelihoods, separately for first and subsequent screens, were specified. After substituting the closed form estimates of S as above, maximum likelihood estimation (MLE) on the total log-likelihood was performed to obtain point estimates for λ1 and λ2. The point estimates of λ1 and λ2 were used to calculate the expected values of P, I, and C. The 95% confidence intervals around these expected values were approximated and calculated by the following formula
Method 2: based on non-linear regression
If test sensitivity, calculated under the assumption that interval cancers in the first year after screening are missed cancers, is not adequately describing the real test sensitivity of mammography, a more complex estimation method is needed to estimate the three parameters at once. Then, a three-state Markov model can be applied to depict the progression process of breast cancer from the states of free from breast cancer (state 0), preclinical disease (state 1), and clinical disease (state 2). Breast cancers detected at screens were those in the preclinical detectable phase (PCDP), and interval breast cancers were in the clinical phase (CP).24,25 Let the underlying incidence of preclinical disease and the rate of disease progression from preclinical to clinical phase be denoted as above by λ1 and λ2, respectively. The intensity matrix of the three-state model is thus
With the following definitions
The probabilities of observing preclinical cancers and subjects free from breast cancer in the prevalent screening round are thus
The parameters were estimated by NLR, with the expected numbers of cancers at first and subsequent screens, and numbers of interval cancers based on the above formulae as the regression predictor and the observed numbers as the dependent variable. 24
Method 3: based on Markov Chain Monte Carlo simulation
We also applied Markov Chain Monte Carlo simulation (MCMC) to estimate the parameters from the model described above. The same formulae for probabilities and expectations were used, but the parameters were assigned vague prior distributions and estimated in a Bayesian framework using MCMC. 26 A Gibbs sampler was used to derive samples of a stationary posterior distribution by which inferences on parameters were drawn. A thinning interval of 3 with a burn-in interval of 10,000 and a total of 15,000 iterations were used, which yields a total of 5000 updated posterior samples.
Results
The numbers of invited, screened, and referred women over the four periods, as well as the numbers of screen-detected and interval cancers, are presented in Table 1. The attendance rate steadily increased over the four periods. The referral rate was stable in periods 1 and 2, but increased during the last two periods, to 29 per 1000 women screened. The first rise in the referral rate coincided with the optimization study and the recommendation to increase referral. 17 The second rise was seen after the introduction of digital mammography. With digital mammography (period 4), the screen-detected cancer rate increased from 5 to almost 7 tumours per 1000 women screened.19,27 The number of screen-detected ductal carcinomas in situ was also higher, however, the interval cancer rate remained stable over the four periods.
Number of invited, screened, and referred women (aged 50–69) and number of screen-detected and interval-detected cancers for all screens and first and subsequent screens separately in the Nijmegen Breast Cancer Screening Program in the period 1975–2012.
aIn parentheses, the number of ductal carcinoma in situ (DCIS) out of the total number of cancers.
Table 2 shows the estimates for test sensitivity, the underlying breast cancer incidence, and mean sojourn time over the four periods calculated by each of the three methods (MLE, NLR, MCMC). The estimated underlying breast cancer incidence has risen with approximately 1 case per 1000 women per year in the final period compared with the first for all methods. The calculated test sensitivity of mammography using MLE was around 90%. In contrast, the estimated test sensitivity using NLR and MCMC was higher in period 4 (86% and 79%, respectively) compared with the previous periods. The estimates of mean sojourn time remained fairly stable over the first three periods, whereas the point estimates of all methods showed a small increase in period 4 compared with period 3 (MLE: 2.4 vs. 3.3, NLR: 3.6 vs. 4.4, MCMC: 4.3 vs. 4.6). The mean sojourn time estimated based on MLE was three years, while the estimates of NLR and MCMC were closer to four years.
Estimation of mammography sensitivity (S), mean sojourn time (MST), underlying breast cancer incidence (λ1), using three estimation methods applied to data from women aged 50–69 invited to participate in the Nijmegen Breast Cancer Screening Program in the period 1975–2012.
MLE: maximum likelihood estimation; NLR: non-linear regression; MCMC: Markov Chain Monte Carlo simulation.
Within parentheses, 95% confidence interval.
Discussion
Our analysis over 40 years of breast cancer screening in Nijmegen showed a trend towards higher test sensitivity and longer mean sojourn time after the introduction of digital mammography. The three methods used for obtaining estimates of mammography test sensitivity and mean sojourn time gave comparable results. However, the more sophisticated methods, NLR and MCMC, which also estimated test sensitivity from empirical screening data, showed more realistic estimates.
The three methods applied in this study have been used previously to estimate test sensitivity and mean sojourn time, but have never been used to investigate these parameters over time as well as been directly compared with one another before. Our estimates of mean sojourn time during the period of analogue mammography (period 1–3) are comparable with previous results.24,25 In contrast, the results of period 4 are difficult to compare with what is known, because to our knowledge there are no other estimates of mean sojourn time based on digital mammography. The slight increase in test sensitivity after the introduction of digital mammography has been found previously.19,27,28
A major strength of our study is that we were able to investigate the consequences of changes in the screening program on important parameters for the screening interval over a 40-year follow-up period in a well-documented screening program. As the Nijmegen screening program has been running for decades, there were no women who were not invited for screening or reliable estimate of the underlying breast cancer incidence available. Therefore, the underlying breast cancer incidence also needed to be estimated. The estimated underlying breast cancer incidence showed an upwards trend over the four periods. A similar trend in the breast cancer incidence was seen in women aged 45–49 in the Netherlands (unscreened population). 29 This can be explained by opportunistic screening and an increase in breast cancer risk factors. Furthermore, the stable interval cancer rates and higher screen detection rates support a real increase in the incidence, and not an increase in overdiagnosis. We also found that the results of the three methods correlated well, which may suggest that the estimates are accurate, also in comparison with previously published results for screening with analogue mammography.24,30
Another strength is that our approach can easily be applied to other (breast) cancer screening programs to investigate the impact of changes in a screening program on test sensitivity and sojourn time. The MLE method, with a very simple closed form approximation for sensitivity, gave rather high estimates for sensitivity in the early periods, but is fairly easy to use. Moreover, this method is more constrained than the other two methods and arguably gives confidence intervals that are too small, not reflecting the entire uncertainty in the data. It may be that the approximations used in this method were less accurate than those in the more sophisticated methods (NLR, MCMC). NLR assumes a Poisson distribution of observed total numbers with their expectations, which leads to very wide confidence intervals. MCMC takes account of the conditionalities of these distributions on the distributions of related variables, and could, therefore, be argued to model more closely the interrelationships in the data and the model parameters. Thus, MCMC has arguably the most realistic confidence interval estimates. As there is no gold standard for estimating mean sojourn time, it is difficult to judge which of these three estimation methods has the most reliable outcome. Furthermore, we chose methods that can be applied to routinely collected screening data instead of more complex data, such as tumour size for calculating tumour volume/doubling times.
An important limitation is the size of our dataset, as we use data from a breast cancer screening program in a single city in the Netherlands. Because of the limited number of cancers detected and wide confidence intervals, our results need to be interpreted with caution. Nonetheless, all three methods, especially NLR and MCMC, gave the impression that mean sojourn time was longer after the introduction of digital mammography. The point estimate of mean sojourn time is almost 1.5 years longer than when the screening program started (NLR: 3.1 vs. 4.4 year, MCMC: 3.4 vs. 4.6 year). This may be seen as the first piece of evidence towards considering a longer screening interval. To be more certain about our findings, validation of the results in a larger dataset would be the next step. Furthermore, as not all breast cancers have a mammographically detectable preclinical phase, we can estimate the mean sojourn time only as an average for the breast cancers that can be detected by mammography and those that cannot. Moreover, it is likely that the prognosis of the cancer in these two groups is different, which may also have affected the estimates of mean sojourn time. 31 The potential introduction of new or additional screening modalities, such as digital breast tomosynthesis, automated breast ultrasound, and breast MRI, will probably make it possible to estimate the mean sojourn time for a larger proportion of all breast cancers. Future research should also include estimating test sensitivity and mean sojourn time for subgroups of women at varying levels of breast cancer risk. This could underpin the length of the screening interval for subgroups of women in the move towards a risk-based breast cancer screening program, rather than the one-size-fits-all approach based on age alone.
Conclusion
This study shows that test sensitivity and mean sojourn time, while taking the underlying breast cancer incidence rate into account, can be investigated based on routinely available screening data, with more complex methods providing the most realistic outcomes.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
