Sage Journals: Discover world-class research

Abstract

Objective

According to the Independent UK Panel on Breast Cancer Screening, the most reliable estimates of overdiagnosis for breast cancer screening come from stop-screen trials Canada 1, Canada 2, and Malmo. The screen-interval overdiagnosis fraction is the fraction of cancers in a screening program that are overdiagnosed. We used the cumulative incidence method to estimate screen-interval overdiagnosis fraction. Our goal was to derive confidence intervals for estimated screen-interval overdiagnosis fraction and adjust for refusers in these trials.

Methods

We first show that the UK Panel’s use of a 95% binomial confidence interval for estimated screen-interval overdiagnosis fraction was incorrect. We then derive a correct 95% binomial-Poisson confidence interval. We also use the method of latent-class instrumental variables to adjust for refusers.

Results

For the Canada 1 trial, the estimated screen-interval overdiagnosis fraction was 0.23 with a 95% binomial confidence interval of (0.18, 0.27) and a 95% binomial-Poisson confidence interval of (0.04, 0.41). For the Canada 2 trial, the estimated screen-interval overdiagnosis fraction was 0.16 with a 95% binomial confidence interval of (0.12, 0.19) and a 95% binomial-Poisson confidence interval of (−0.01, 0.32). For the Malmo trial, the estimated screen-interval overdiagnosis fraction was 0.19 with a 95% binomial confidence interval of (0.15, 0.22). Adjusting for refusers, the estimated screen-interval overdiagnosis fraction was 0.26 with a 95% binomial-Poisson confidence interval of (0.03, 0.50).

Conclusion

The correct 95% binomial-Poisson confidence interval s for the estimated screen-interval overdiagnosis fraction based on the Canada 1, Canada 2, and Malmo stop-screen trials are much wider than the previously reported incorrect 95% binomial confidence intervals. The 95% binomial-Poisson confidence intervals widen as follow-up time increases, an unappreciated downside of longer follow-up in stop-screen trials.

Keywords

Breast cancer screening complier average causal effect confidence interval cumulative incidence method excess incidence latent class instrumental variables noncompliance overdiagnosis

Introduction

Overdiagnosis is the screen-detection of a preclinical cancer that would not have developed into clinically diagnosed cancer in the absence of screening. Because overdiagnosis causes unnecessary treatment estimating the fraction overdiagnosed is important for the evaluation of cancer screening. Three measures for the fraction overdiagnosed are

the screen overdiagnosis fraction (SOF), the probability of overdiagnosis among screen-detected preclinical cancers,

the screen-interval overdiagnosis fraction (SIOF), the probability of overdiagnosis among cancers diagnosed in the screening program (preclinical cancers detected on screening and clinical cancers arising in subsequent intervals),

the program overdiagnosis fraction (POF), the probability of overdiagnosis among persons entering the screening program.

SOF is most relevant for patient decision-making, but estimation requires data on the number of screen detections, which may be difficult to obtain. Some investigators favor SIOF because it is less dependent on screening frequency than SOF.¹,² POF is relevant for policy analysts weighing harms and benefits of a screening program. We primarily discuss estimation of SIOF and briefly discuss the direct extension to SOF as well as estimation of POF.

A stop-screen trial randomizes participants to a no-screening control group or a screening group with follow-up after the last screen. Under the cumulative incidence method (CIM), the estimated SIOF (with equal numbers randomized to each group) is the excess cumulative number of cancers in the screened group versus the control group divided by the number of cancers diagnosed during the screening period.³,⁴ If follow-up is longer than the maximum lead time, the estimated SIOF is an unbiased estimate of the true SIOF.³,⁴

According to the Independent UK Panel on Breast Cancer Screening (the UK Panel),¹,² the “most reliable estimates of overdiagnosis” in breast cancer screening are CIM estimates from three stop-screen trials: Canada 1 ages 40–49, Canada 2 ages 50–59, and Malmo ages 50–69. The UK Panel estimated SIOF in these trials and reported a 95% binomial confidence interval (BCI) that makes an incorrect assumption about the variance. Jacklyn et al.,⁵ Baker et al.,⁶ and Nelson et al.⁷ also reported incorrect 95% BCIs for estimated SIOF or SOF. In the Methods section, we explain the problem with using the 95% BCI and derive a correct 95% confidence interval for estimated SIOF, which we call the binomial-Poisson confidence interval (BPCI).

Another important issue in estimating SIOF is adjusting for refusers, participants randomized to screening who do not receive screening. In the Malmo trial, approximately 30% of woman invited for screening were refusers. The adjustment for refusers increases the estimated SIOF and the estimated variance of the estimated SIOF. To adjust for refusers, we used the one-sided version of the method of latent class instrumental variables⁸ that was independently formulated in 1994 by Baker and Lindeman⁹ and Imbens and Angrist.¹⁰ The method of latent class instrumental variables estimates a quantity often called the local average treatment effect (LATE)¹⁰,¹¹ or the complier average causal effect (CACE).¹² The one-sided version was independently formulated in 1983 by Baker in a technical report⁸ and in 1984 by Bloom.¹³ In the Methods section, we explain how it uses randomization and latent classes to avoid bias while requiring an assumption that almost certainly holds. The two-sided version adjusts for both refusers in the screened group and participants in the control group who immediately receive screening outside the trial. Although the more general two-sided method of latent class instrumental variables generally yields identical estimates to the estimates from the deattenuated method of Newcombe¹⁴ (which Jacklyn et al.⁵ applied to overdiagnosis), and the method of Cuzick et al.,¹⁵ the assumptions differ.

We reanalyzed the data from the UK Panel using the one-sided method of latent class instrumental variables to adjust for refusers and using a corrected variance to obtain confidence intervals. We also compared the BPCI to bootstrap confidence intervals calculated in lung cancer screening trials.

Methods

For the specified follow-up after the time of the last screen, the estimated SIOF under the CIM is

SIO F_{CIM} = (n_{1} - n_{0} r) / n_{S}

(1)

where

n₀ = cumulative number of clinical cancers in the control group,

n₁ = cumulative number of screen-detected preclinical cancers and clinical cancers in the screened group,

n_S = number of diagnosed cancers (screen-detected preclinical cancers and clinical cancers) in the screened group during the screening period,

r = ratio of sample sizes of the screened to control groups.

BCIs—an incorrect measure of uncertainty

The UK Panel reported a 95% BCI for the estimated SIOF, which implicitly assumed the following binomial estimated variance for SIOF_CIM,

v a r_{Bin} (SIO F_{CIM}) = SIO F_{CIM} (1 - SIO F_{CIM}) / n_{S}

(2)

Equation (2) is an incorrect estimated variance for SIOF_CIM because the numerator of SIOF_CIM in equation (1) is not a realization from a binomial distribution. A binomial random variable is defined as the sum of independent Bernoulli random variables (random binary indicators for each individual). The numerator of SIOF_CIM is not a sum of realizations of independent Bernoulli random variables, so the use of an estimated binomial variance is incorrect.

A likely reason that some investigators use equation (2) for the estimated variance of SIOF_CIM is the mistaken view that SIOF_CIM is equivalent to SIOF_Bin = x/n_S, where x is the number of diagnosed cancers in the screening program that are overdiagnosed. (SIOF_Bin is not used because it is not possible to observe x.) Let Y_i denote a Bernoulli random variable, where Y_i = 1 if the ith diagnosed cancer in the screening program is overdiagnosed, with probability SIOF, and 0 with probability (1–SIOF). Let random variable X (with realization x) equal Σ_i Y_i, for i = 1, 2, … , n_S. Based on probability theory, X follows a binomial distribution with expected value n_S SIOF and variance n_S SIOF (1–SIOF)_. For this binomial distribution, the maximum likelihood estimate of SIOF is SIOF_Bin = x/n_S, and the estimated variance for the binomial distribution is SIOF_Bin (1–SIOF_Bin)/n_S., which has the same form as the estimated variance in equation (2) and is a correct variance for SIOF_Bin.

In summary, because the numerator of SIOF_CIM is a weighted difference in the cumulative numbers of cancers in the two randomization groups and not the sum of realizations of independent Bernoulli random variables (unlike the numerator of SIOF_Bin), the use of a 95% BCI for SIOF_CIM is incorrect.

BPCIs—a correct measure of uncertainty

To obtain the correct variance of the estimated SIOF, we first rewrite the estimated SIOF as

SIO F_{CIM} = (n_{C} + n_{S} - n_{0} r) / n_{S}, where n_{C} = n_{1} - n_{S}

(3)

We assume the cumulative number of cancers at a given time follows an independent Poisson distribution for each group, as specified by n₀ ∼ Poisson(1/μ₀) and n₁ ∼ Poisson(1/μ₁). A Poisson distribution assumes the occurrence of one event does not affect the probability that another event will occur, so it is appropriate for modeling the numbers of cancers arising each year after randomization. The cumulative number of cancers in each year after randomization follows a Poisson distribution because the sum of independent Poisson random variables is a Poisson random variable. Given the n₁ cumulative cancers in the screened group, we assume the number n_S who are diagnosed with cancer during the screening period follows a binomial distribution, as specified by n_S ∼ binomial(n₁, π). Under these assumptions, E(n₀) = μ₀, Var(n₀) = μ₀, E(n₁) = μ_1, Var(n₁) = μ₁, E(n_S|n₁) = n₁ π, and Var(n_S|n₁) = π (1–π) n₁. Applying the law of total covariance,

\begin{array}{l} Cov (n_{S}, n_{C}) \\ = E {Cov (n_{S}, n_{C} | n_{1})} + Cov {E (n_{S} | n_{1}), E (n_{C} | n_{1})} \\ = E {- Var (n_{S} | n_{1})} + Cov {E (n_{S} | n_{1}), E (n_{1} - n_{S} | n_{1})} \\ = E {(- π (1 - π) n_{1})} + Cov {n_{1} π, n_{1} - n_{1} π)} \\ = - π (1 - π) μ_{1} + π (1 - π) μ_{1} = 0 \end{array}

(4)

Because the counts from each randomization group are independent, Cov(n_S, n₀) = Cov(n_C, n₀) = 0. Based on the delta method with the covariances equal to 0, the variance of SIOF_CIM is

\begin{array}{l} Var (SIO F_{CIM}) = (\partial SIO F_{CIM} / \partial n_{0})^{2} V a r (n_{0}) \\ + (\partial SIO F_{CIM} / \partial n_{S})^{2} V a r (n_{S}) \\ + (\partial SIO F_{CIM} / \partial n_{C})^{2} V a r (n_{C}) \end{array}

(5)

Applying the law of total variance,

\begin{array}{l} Var (n_{S}) = E {Var (n_{S} | n_{1})} + Var {E (n_{S} | n_{1})} \\ = E {π (1 - π) n_{1}} + Var (n_{1} π) \\ = π (1 - π) μ_{1} + π^{2} μ_{1} = π μ_{1} = E (n_{S}) \end{array}

(6)

Equation (6) suggest estimating Var(n_S) by n_S. A similar equation suggests estimating Var(n_C) by n_C. For the Poisson distribution, the usual estimate of Var(n₀) is n₀. Substituting these variance estimates into equation (5), treating r as known (due to the large sample size), and simplifying yields the estimated variance of SIOF _CIM ,

Var (SIO F_{CIM}) = {n_{C}^{2} + n_{0} (n_{0} + n_{S}) r^{2} + n_{C} (n_{S} - 2 n_{0} r)} / n_{S}^{3}

(7)

If r = 1 the variance in equation (7) reduces to

{Var}_{est} (SIO F_{CIM}) = {{(n_{0} - n_{C})}^{2} + (n_{0} + n_{C}) n_{S}} / n_{S}^{3}

(8)

The 95% BPCI is SIOF_CIM ± 1.96 Var(SIOF_CIM)^½.

Adjusting for refusers

Another factor that affects variability in the overdiagnosis estimate, as well as the magnitude of the estimate itself, is noncompliance. An often-encountered form of noncompliance in cancer screening is refusers; i.e. individuals who are offered screening but do not receive screening. We adjust for refusers using the one-sided version of the method of latent class instrumental variables,⁸ applied to estimating the fraction overdiagnosed. Consider two latent classes: (i) never-takers who would not receive screening regardless of randomization group and (ii) compliers who would receive screening only if randomized to screening. For a population (ignoring sampling variability), let

q_complier1 = cumulative probability of screen-detected preclinical cancer or clinical cancer among compliers randomized to screening,

q_complier0 = cumulative probability of screen-detected preclinical cancer or clinical cancer among compliers randomized to no screening,

q_never-taker = cumulative probability of screen-detected preclinical cancer or clinical cancer among never-takers,

π = probability of being a complier, which, by virtue of the randomization, is the same in both randomization groups.

For a population, the cumulative probabilities of screen-detected preclinical cancer or clinical cancer in the screened and control groups are, respectively

q_{0} = π q_{complier 0} + (1 - π) q_{never - taker}

(9)

q_{1} = π q_{complier1} + (1 - π) q_{never - taker}

(10)

An assumption in equations (9) and (10) is that q_never-taker does not vary by randomization group, which almost certainly holds because refusing screening implies that never-takers in both groups have the identical experience of no screening, and it is unlikely that there would be other cancer treatments (e.g. tamoxifen) preferentially received in one randomization group. Let SIOF_CIM(pop) denote the estimated SIOF if the estimation formula was applied to the general population, which includes both compliers and never-takers. Let SIOF_CIM*(pop) denote the estimated SIOF if the estimation formula was applied to a population of compliers. Based on equations (9) and (10)

\begin{array}{l} S I O F_{CIM (pop)} = (q_{1} - q_{0}) / q_{S} = {π (q_{complier1} - q_{complier 0}} / q_{S} \\ = π SIO F_{CIM * (pop)} \end{array}

(11)

Let SIOF_CIM* denote the estimated SIOF among compliers in the randomized trial. Let π_est denote the fraction randomized to screening who receive screening. Ignoring the sampling variability in π_est, which is small due to the large numbers involved, equation (11) implies

SIO F_{CIM *} = SIO F_{CIM} / π_{est}

(12)

{Var}_{est} (SIO F_{CIM *}) = var (SIO F_{CIM}) / π_{est}^{2}

(13)

Use of SIOF_CIM* preserves the advantage of randomization and thereby avoids the bias of comparing estimated SIOF in refusers with estimated SIOF among person receiving screening. We assume SIOF_CIM*, which strictly applies only to compliers, generalizes to all participants. This is a similar level of generalizability as going from results of a randomized trial to the general population. The 95% BPCI adjusted for refusers is SIOF_CIM* ± 1.96 Var(SIOF_CIM*)^½. The adjustment for refusers increases the width of the confidence interval beyond the increased width due to the use of BPCI instead of BCI. See Appendix 1 for estimates and estimated variances for SOF and POF.

Wider BPCIs with longer follow-up

The widening of the BPCI with longer follow-up can be seen by inspecting the variance in equation (8). A longer follow-up time increases n₀ and n_C while n_S remains constant. The magnitudes of the increases in n₀ and n_C are similar because both involve clinical cancers arising in the follow-up period, which occurs in approximately equal numbers in each randomization group. Therefore, with longer follow-up, the main change in the variance in equation (8) is an increase in n₀ + n_C in the numerator, which increases its magnitude. As shown in Appendix 1, BPCIs for SOF and confidence intervals for estimated POF also widen with longer follow-up.

Results

Reanalysis of breast cancer stop-screen trials

Using the BPCI with adjustment for refusers (where applicable), we reanalyzed the breast cancer stop-screen trials analyzed by the UK Panel. For these trials, the number randomized to the screening program approximately equaled the number randomized to the control group. Following the UK Panel, we discuss estimation of SIOF for all breast cancers (invasive and in situ).

The Canadian National Breast Screening Study-1 (Canada 1 trial) randomized approximately 50,000 women aged 40–49 to four or five annual rounds of mammography and physical examination versus an initial physical examination followed by usual care.¹⁶ The UK Panel analyzed the following data from Miller et al.¹⁶ involving follow-up times of 11 to 16 years. Based on the abstract in Miller et al.,¹⁶ there were n₀ = 552 + 29 = 581 breast cancers in the control group and n₁ = 592 + 71 = 663 breast cancers in the screened group, yielding an excess of 82 cancers in the screened versus control groups. Combining the numbers of invasive breast cancers in Table 2 of Miller et al.¹⁶ with their reported 71 in situ breast cancers yielded n_s = 21 + 66 + 17 + 48 + 73 + 40 + 25 + 71 = 361 breast cancers (screen-detected preclinical cancer and interval cancers) diagnosed during the screening period.

The Canadian National Breast Screening Study-2 (Canada 2 trial) randomized approximately 40,000 women aged 50–59 to four or five annual rounds of mammography and physical examination versus annual physical examinations.¹⁷ The UK Panel analyzed the following data from Miller et al.¹⁷ involving follow-up times of 11 to 16 years. Based on the abstract in Miller et al.,¹⁷ there were n₀ = 610 + 16 = 626 breast cancers in the control group and n₁ = 622 + 71 = 693 breast cancers in the screened group, yielding an excess of 67 breast cancers in the screened versus control groups. Combining the numbers of invasive cancers in Table 2 of Miller et al.¹⁷ with their reported 71 in situ cancers yielded n_S = 118 + 149 + 14 + 36 + 32 + 71 = 420 breast cancers (screen-detected preclinical cancers and interval cancers) diagnosed during the screening period.

The Malmo study randomized approximately 25,000 women aged 50–59 to an invitation for at least five rounds of mammography in intervals of 18 to 24 months versus no screening.¹⁸ The UK Panel analyzed the following data from Zackrisson et al.¹⁹ involving follow-up after the last screen to 15 years and an age range of 50–59. Based on Table 2 in Zackrisson et al.,¹⁹ there were n₀ = 324 + 374 = 698 breast cancers in the control group and n₁ = 438 + 342 = 780 breast cancers in the screened group, yielding an excess of 82 breast cancers in the screened versus control groups. Based on Table 2 in Zackrisson et al.,¹⁹ there were n_S = 438 breast cancers (screen-detected preclinical cancers and interval cancers) diagnosed during the screening period (which they called “Period 1”). A weighted average of data for ages 55–59, 60–64, and 65–69 (Table 1 of Zackrisson et al.²⁰) indicates that 71% of invited women aged 55–69 attended the first screen, so π_est = 0.71.

Substituting the values for n₀, n₁, and n_S from these trials into equations (8) to (11) yielded estimates of SIOF adjusted for refusers (if applicable) and with corrected 95% confidence intervals. For the Canada 1 trial, the estimated SIOF was 0.23. The UK Panel reported a 95% BCI of (0.18, 0.27) while the 95% BPCI was (0.04, 0.41). For the Canada 2 trial, the estimated SIOF was 0.16. The UK Panel reported a 95% BCI of (0.12, 0.19) while 95% BPCI was (−0.01, 0.32). For the Malmo study and no correction for refusers, the estimated SIOF was 0.19. The UK Panel reported a 95% BCI of (0.21, 0.32). Adjusting for refusers using the method of latent class instrumental variables, the estimated SIOF was 0.26 with a 95% BPCI of (0.03, 0.50). Figure 1 shows a graphical display of these results.

Figure 1.

Estimated SIOF for all breast cancers in the Canada 1, Canada 2, and Malmo trials. The blue vertical lines indicate the 95% BCI reported by the UK Panel, which is based on an incorrect application of the binomial variance. The green vertical lines indicate the correct 95% binomial Poisson confidence interval (BPCI). The red vertical lines indicate the correct 95% binomial Poisson confidence interval with an adjustment for refusers via latent class instrumental variables.

Reanalysis of two breast cancer stop screen trials with long follow-up

For the Canada 1 and Canada 2 stop-screens trial, data were available on cumulative numbers of breast cancers at various years after the last screen.²¹ Participants received either four or five annual interventions prior to follow-up. Based on Table 2A in Baines et al.,²¹ for Canada 1, n₀ = {234, 271, 318, 373, 432, 487, 828, 1322, 1633} and n₁ = {326, 371, 424, 480, 533, 590, 958, 1432, 1771} for follow-up year after the last screen of {0, 1, 2, 3, 4, 5, 10, 15, 20}, with n_S = 326. Based on Table 2B in Baines et al.,²¹ for Canada 2, n₀ = {262, 304, 349, 406, 475, 536, 898, 1293, 1518} and n₁ = {377, 424, 454, 499, 557, 615, 942, 1338, 1568} for follow-up year after the last screen of {0, 1, 2, 3, 4, 5, 10, 15, 20}, with n_S = 377. (The counts based on time since last screen do not match the counts based on the ideal metric of time since randomization discussed in Miller et al.^16,17 In particular, the counts for n_S differ with these metrics because the last screen was screen 4 in some participants and screen 5 in other participants.) Figure 2 shows a graphical display of these results. To better illustrate the variability of the estimated SIOF over time, Figure 2(b) and (d) shows 100 estimated SIOF curves arising from counts randomly generated under a Poisson distribution, with some of the most extreme curves highlighted.

Figure 2.

SIOF estimates from stop-screen trials: (a) Canada 1 with 95% BPCIs, (b) Canada 1 with randomly generated curves based on observed data; colored curves show some of the extreme realizations, (c) Canada 2 with 95% BPCIs, and (d) Canada 2 with randomly generated curves based on observed data; colored curves show some of the extreme realizations.

Reanalysis of lung cancer stop-screen trials

In applications to stop-screen trials for lung cancer screening, the BPCIs for the estimated SOF agreed with the bootstrap confidence intervals. For the estimated SOF of 0.185 in a stop-screen trial of low-dose computed tomography screening for lung cancer,²² the 95% BPCI was (0.055, 0.314) and the 95% bootstrap confidence interval was (0.054, 0.306). For the estimated SOF of 0.197 in a stop-screen trial with volume CT screening for lung cancer,²³ the 95% BPCI was (–0.035, 0.429) and the 95% bootstrap CI was (–0.052, 0.416). These results increase confidence in the appropriateness of the BPCI.

Discussion

Despite its limitations, the CIM to estimate the fraction overdiagnosed in a stop-screen trial is the most reliable method for estimating the fraction overdiagnosed. Other methods of estimation, involving data from other types of trials or with population data or with mathematical modeling, require more assumptions.²⁴ If data are available on the number of screen detections, investigators should estimate SOF instead of SIOF because SOF is more relevant to patients. POF is useful for weighing benefits and harms. Benefits involve possible reduction in breast cancer mortality. Harms include the cumulative numbers of false positives^25,26 (a positive screen followed by a negative work-up) and overdiagnosis.

As follow-up time after the last screen increases, the bias of the estimated SIOF, SOF, or POF (all based on the CIM) decreases.⁴ To understand this decrease in bias with longer follow-up consider the following thought experiment involving bias of estimated SIOF. Suppose you went back in time and assigned all persons in the screened group to a counterfactual control group who did not receive screening. Assume there is no overdiagnosis. Therefore, any difference between the cumulative numbers of cancers in the screened group and the cumulative number of cancers in the counterfactual control group (at a given time) yields a value of SIOF that is biased. Screening detects a cancer in the screened group that would have surfaced later in the counterfactual control group. Therefore, at the time of the last screen, the cumulative number of cancers in the screened group is larger than cumulative number of cancers in the hypothetical control group, which can only be bias in the SIOF estimate since we have assumed no overdiagnosis. As follow-up time increases, the bias in SIOF decreases because the cumulative number of cancers in the counterfactual control group is catching up to the cumulative number of cancers in the screened group (arising from the diagnosis of cancers in the counterfactual control group that were previously detected on screening in the screened group). Our key contribution is showing that, while bias decreases, confidence intervals for estimated SIOF, SOF, and POF widen as follow-up time increases—a bias-variance tradeoff. If bias levels off starting at some time since randomization (indicated by constant estimated SIOF, SOF, or POF), we suggest reporting the estimate and estimated variance at the start of this time range (when the estimated variance should be smallest).

The key mathematical result underpinning the correct computation of the estimated variance is that random variables generated from a binomial distribution with counts that are a Poisson random variable are independent Poisson random variables. When one considers that the sum of two Poisson random variables is a Poisson random variable, this result is intuitive, as conditioning a binomial distribution on a Poisson random variable provides no information for constraining the resulting random variables.

The difference in shape of the estimated SIOF curves for Canada 1 and Canada 2 in Figure 2 is striking. If there is no overdiagnosis or constant overdiagnosis rates over follow-up time, narrow confidence intervals, correct implementation of randomization, and no screening in the follow-up period, the estimated SIOF should decrease with follow-up time, as with the estimated SIOF curve for Canada 2. Possible explanations for the increasing estimated SIOF curve for Canada 1 are random variability (as some randomly generated estimate SIOF curves decrease with follow-up time), overdiagnosis rates that increase with age, imperfect randomization, and screening in the follow-up period.

There are concerns of bias in estimating the fraction overdiagnosed in the Canada and Malmo trials due to screening during the follow-up period.²⁷ Screening during the follow-up period among controls would yield an estimated SIOF that would be smaller than the estimated SIOF under the required scenario of no screening during the follow-up period (because of overdiagnosis in the control group during follow-up). Screening during the follow-up period among the screened group would yield an estimated SIOF that would be larger than the estimated SIOF under the required scenario of no screening during the follow-up period (because of overdiagnosis in the screened group during follow-up). The overall effect of screening during follow-up would depend on how much screening occurred in the control group and how much occurred in the screened group during follow-up. For the Malmo trial, Njor et al.²⁷ provided strong evidence of screening after the end of follow-up in the group invited for screening. For the Canadian trials, Njor et al.²⁷ cited a report that screening was implemented after the trial stopped in provinces from which the trial population was recruited. If this post-trial screening occurred preferentially in the screened group, there would be upward bias in the estimated fraction overdiagnosed. Following these arguments, there is a strong risk of upward bias in the estimated fraction overdiagnosed in both the Canada and Malmo trials.

The two-sided method of latent class instrumental variables would adjust for both refusers in the screened group and participants in the control group who immediately start screening after randomization.⁸ The method applies only to immediate switching of treatments after randomization. Adjusting for delayed switching of treatments after randomization (e.g. participants who receive some screens and then stop receiving screens) requires much more restrictive assumptions, which may not hold.⁸

In summary, the reported 95% BCIs for SIOF in the Canada 1, Canada 2, and Malmo breast cancer stop-screen trials are incorrect due to misuse of the binomial variance. The correct 95% confidence intervals are BPCIs with adjustment for refusers in the Malmo trial. They are much wider than the 95% BCIs. In other words, there is much more uncertainty in the amount of overdiagnosis in breast cancer screening than previously reported. We recommend that investigators quantify overdiagnosis in stop-screen trials by estimating SIOF or SOF and computing 95% BPCIs with adjustment for refusers via the method of latent class instrumental variables.

Supplemental Material

sj-jpg-1-msc-10.1177_0969141320950784 - Supplemental material for Breast cancer overdiagnosis in stop-screen trials: More uncertainty than previously reported

Supplemental material, sj-jpg-1-msc-10.1177_0969141320950784 for Breast cancer overdiagnosis in stop-screen trials: More uncertainty than previously reported by Stuart G Baker and Philip C Prorok in Journal of Medical Screening

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Cancer Institute.

ORCID iD

Stuart G Baker

Appendix 1

References

Marmot

Altman

Cameron

, et al. The benefits and harms of breast cancer screening: an independent review. Lancet 2012; 380: 1778–1786.

Independent UK Panel on Breast Cancer Screening. The benefits and harms of breast cancer screening: an independent review. Br Med J 2013; 108: 2205–2240.

Biesheuvel

Barratt

Howard

, et al. Effects of study methods and biases on estimates of invasive breast cancer overdetection with mammography screening: a systematic review. Lancet Oncol 2007; 8: 1129–1138.

Duffy

Parmar

Overdiagnosis in breast cancer screening: the importance of length of observation period and lead time. Breast Cancer Res 2013; 15: R41.

Jacklyn

Glasziou

Macaskill

, et al. Meta-analysis of breast cancer mortality benefit and overdiagnosis adjusted for adherence: improving information on the effects of attending screening mammography. Br J Cancer 2016; 114: 1269–1276.

Baker

Prorok

Kramer

BS.

Lead time and overdiagnosis. J Natl Cancer Inst 2014; 106: dju346.

Nelson

Pappas

Cantor

, et al. Harms of breast cancer screening: systematic review to update the 2009 U.S. Preventive Services Task Force Recommendation. Ann Intern Med 2016; 164: 256–267.

Baker

Kramer

Lindeman

KL.

Latent class instrumental variables: a clinical and biostatistical perspective. Stat Med 2016; 35: 147–160.

Baker

Lindeman

KS.

The paired availability design: a proposal for evaluating epidural analgesia during labor. Stat Med 1994; 13: 2269–2278.

10.

Imbens

Angrist

JD.

Identification and estimation of local average treatment effects. Econometrica 1994; 62: 467–475.

11.

Angrist

Imbens

Rubin

DB.

Identification of causal effects using instrumental variables. J Am Stat Assoc 1996; 92: 444–455.

12.

Imbens

Rubin

DB.

Bayesian inference for causal effects in randomized experiments. Ann Stat 1997; 25: 305–327.

13.

Bloom

HS.

Accounting for no-shows in experimental evaluation designs. Eval Rev 1984; 8: 225–224.

14.

Newcombe

RG.

Explanatory and pragmatic estimates of the treatment effect when deviations from allocated treatment occur. Stat Med 1988; 7: 1179–1186.

15.

Cuzick

Edwards

Segnan

Adjusting for non-compliance and contamination in randomized clinical trials. Stat Med 1997; 16: 1017–1029.

16.

Miller

Baines

, et al. Canadian National Breast Screening Study: 1. Breast cancer detection and death rates among women aged 40 to 49 years. Can Med Assoc J 1992; 147: 1459–1476.

17.

Miller

Baines

, et al. Canadian National Breast Screening Study-2: 13-year results of a randomized trial in women aged 50–59 years. J Natl Cancer Inst 2000; 92: 1490–1499.

18.

Andersson

Aspegren

Janzon

, et al. Mammographic screening and mortality from breast cancer: the Malmö mammographic screening trial. Br Med Assoc 1988; 297: 943–948.

19.

Zackrisson

Andersson

Janzon

, et al. Rate of over-diagnosis of breast cancer 15 years after end of Malmö mammographic screening trial: follow-up study. Br Med Assoc 2006; 332: 689–692.

20.

Zackrisson

Andersson

Manjer

, et al. Non-attendance in breast cancer screening is associated with unfavourable socio-economic circumstances and advanced carcinoma. Int J Cancer 2004; 108: 754–760.

21.

Baines

Miller

AB.

Revised estimates of overdiagnosis from the Canadian National Breast Screening Study. Prev Med 2016; 90: 66–71.

22.

Patz

Jr Pinsky

Gatsonis

, et al. Overdiagnosis in low-dose computed tomography screening for lung cancer. JAMA Intern Med 2014; 174: 269–274.

23.

de Koning

van der Aalst

de Jong

, et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med 2020; 382: 503–513.

24.

Ripping

Ten Haaf

Verbeek

ALM

, et al. Quantifying overdiagnosis in cancer screening: a systematic review to evaluate the methodology. J Natl Cancer Inst 2017; 109: djx064.

25.

Baker

Erwin

Kramer

BS.

Estimating the cumulative risk of false positive cancer screenings. BMC Med Res Methodol 2003; 3: 11.

26.

Baker

Kramer

BS.

Estimating the cumulative risk of a false positive under a regimen involving various types of cancer screening tests. J Med Screen 2008; 15: 18–22.

27.

Njor

Garne

Lynge

Over-diagnosis estimate from The Independent UK Panel on Breast Cancer Screening is based on unsuitable data. J Med Screen 2013; 20: 104–105.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.55 MB