A note on the design of cancer screening trials

Abstract

Objectives

To investigate the consequences of different cancer screening trial designs and follow-up options for accuracy of the estimate of the effect of screening on disease-specific mortality.

Methods

We consider a randomized trial of breast cancer screening with a screening phase in which the intervention group is offered screening and the control group is not, and optional further follow-up after this screening phase. Postulating a lead time effect similar to that observed in breast cancer screening trials, we calculate the observed relative risk of disease-specific mortality and compare this with the true relative risk, for four design options: (1) no follow-up beyond the screening phase, ie. the screening phase and the observation period are identical; (2) follow-up continuing beyond the screening phase, all cancer-specific deaths counted, including those diagnosed after the screening phase; (3) follow-up continuing beyond the screening phase, but with only deaths from cancers diagnosed during the screening phase included; and (4) follow-up continuing beyond the screening phase, a single screen of the control group conducted at the end of the screening phase, and only deaths from cancers diagnosed during the screening phase in both arms up to completion of the single control screen included.

Results

All designs in which follow-up for mortality continues beyond the screening phase incurred a bias against screening. The design in which the control group undergoes a single screen at the end of the screening phase was least biased in the example used.

Conclusions

The expedient of a single screen of the control group at the end of the screening phase has acceptable accuracy, but is still slightly conservatively biased.

Keywords

Cancer screening trial design follow-up

Introduction

Randomized trials of cancer screening can involve challenging design choices.^1,2 Ideally, a population would be randomized to the offer of regular screening (intervention group) or to usual care (control group) for a long period of time. Deaths during that period from cancers diagnosed during that period would constitute the trial endpoint. We are here considering screening tests such as mammography or faecal occult blood testing, which are primarily aimed at detection of cancer at an earlier and more treatable stage, and not tests such as flexible sigmoidoscopy or cervical smear testing, which have a major effect of detection and removal of pre-malignancies, and therefore prevention of cancer, albeit while also detecting some cancers early.^3,4

Often, research resources are not available for the screening to continue for long enough to have sufficient endpoint events for adequate statistical power. In this case, a number of options are available. One is to make screening available to the intervention group for a limited period of time, but to extend the period of observation beyond the screening phase. In this design, the intervention group is invited to screening during the earlier period, but not the later, and the control group is never invited to screening. This will entail a conservative bias, as in the latter period, after the screening phase has ended, neither the intervention nor the control group are receiving screening, so there will be no screening effect on deaths from the cancers diagnosed during this period.

A second possibility is to screen the intervention group for a limited period and, as in the previous example, never screen the control group. Mortality is measured in the follow-up period, but only deaths from cancers diagnosed during the screening phase are included. This too will be conservatively biased, as there will be deaths from cancers included in the intervention group that would have been diagnosed after the end of the screening phase if no screening had taken place, but were diagnosed during the screening phase as a result of lead time.⁵ Their counterparts in the control group are diagnosed after the screening phase, due to the absence of lead time in this unscreened population, and are therefore excluded from the analysis.

A third option is to screen the intervention group for a limited period, and at the end of that period, offer one round of screening to the control group. In subsequent follow-up for cancer mortality, deaths are included from all cancers diagnosed in both groups between randomization and the end of the single round of screening in the control group. This design too will dilute the effect of screening, as it includes cancers diagnosed in a period when both groups are receiving the same screening intervention.

In the material following, we demonstrate these conservative biases more formally, with a numerical example.

Methods

Screening designs and endpoints

Suppose we propose a trial to screen for a cancer which, in the absence of screening, has annual incidence rate r, and for which the annual case fatality rate from the time of symptomatic diagnosis is p. Suppose further that, on average, the offer of screening changes this case fatality rate (only of those cancers that would occur symptomatically, and from the date when they would have been diagnosed with symptoms) to θp, where θ < 1. Thus the true relative risk of mortality from the specific cancer conferred by the offer of screening is θ. This will be an average effect, as those who do not take up the offer of screening will receive no benefit, whereas those who do will receive a greater benefit, ie. a relative risk smaller than θ.

If the screening works by diagnosing the cancer at an earlier stage, it must confer a lead time, ie. the screen-detected cancers are diagnosed some time before they would have been diagnosed in the absence of screening. Suppose that at the end of the screening phase, the intervention arm has an additional rate of 2 r cancers, 0.8 r from those that would have arisen without screening during the year following the end of screening, 0.6 r from the next year, 0.4 r from the third year after screening ceases, and 0.2 r from the fourth year. To demonstrate the effect of lead time, we posit this rather specific effect, but note that its magnitude is similar to that observed in the breast screening trials.⁶ These are cancers that would have arisen in any case. The screening may also detect some cancers that are overdiagnosed, but these will not contribute to mortality from the disease.

Design 1: Identical screening and observation periods

If our screening and observation periods are the same, say 15 years following randomization, in the control group, approximating time of diagnosis in any given year as the mid-point of that year, the observed overall rate of deaths from the relevant cancer will be

D_{c} = \sum_{i = 1}^{15} r (15 - i + 0.5) p = 112.5 rp

In the intervention group, it will be

D_{i} = \sum_{i = 1}^{15} r (15 - i + 0.5) θ p = 112.5 r θ p = θ D_{c}

Thus, this design will yield an unbiased estimated relative risk of RR₁ = θ.

The first mortality report of the Nottingham trial of faecal occult blood testing (FOBT) for colorectal cancer used this design and analysis,⁷ and the Swedish Two-county Trial’s first mortality results pertained mainly to the period before the control group was screened, and therefore correspond to this scenario.⁸

Design 2: Extended observation period, all cancers and deaths included

If, on the other hand, screening is offered to the intervention group for the first five years, but cancers and deaths are observed in both groups up to fifteen years, we will have in the control group D_c as before, but in the intervention group, the overall rate of deaths from the cancer will be

D_{i} = \sum_{i = 1}^{5} r (15 - i + 0.5) θ p + (0.8 r θ p + 0.2 rp) \times 9.5 + (0.6 r θ p + 0.4 rp) \times 8.5 + (0.4 r θ p + 0.6 rp) \times 7.5 + (0.2 r θ p + 0.8 rp) \times 6.5 + \sum_{i = 10}^{15} r (15 - i + 0.5) p = 33 rp + 79.5 r θ p = rp (33 + 79.5 θ)

This will give a relative risk of

{RR}_{2} = \frac{79.5 θ + 33}{112.5}

This will exceed θ, and will therefore be biased against the screening effect. The primary analysis of extended follow-up of the Nottingham FOBT trial used this design and analysis.⁹

Design 3: Extended observation period, including only deaths from cancers diagnosed during the screening phase

Now suppose that screening is offered to the intervention group for five years, but never to the control group, and both groups are followed up for 15 years, but only deaths from cancers diagnosed in each group during the first five years are included. In the control group, the overall rate of deaths from the relevant cancer will be

D_{c} = \sum_{i = 1}^{5} r (15 - i + 0.5) p = 62.5 rp

In the intervention group, the overall rate will be

D_{i} = \sum_{i = 1}^{5} r (15 - i + 0.5 + 0.8 \times 9.5 + 0.6 \times 8.5 + 0.4 \times 7.5 + 0.2 \times 6.5) θ p = 79.5 r θ p

and the observed relative risk will be

{RR}_{3} = \frac{79.5 θ}{62.5} = 1.27 \times θ

Again, this is greater than θ, and hence is conservatively biased. The analysis of extended follow-up of theHealth Insurance Plan of Greater New York Breast Screening Trial used this strategy.¹⁰

Design 4: Extended observation period with an exit screen of the control group

In this design, screening is offered to the intervention group for five years, to the control group at the closure of the screening period, so that the single screening round of the control group concludes at the end of the fifth year, and only deaths in both groups from cancers diagnosed up to the end of the control group screen are included. The purpose of this design is to confer approximately the same number of additional lead time cases in the control group as in the intervention group. The overall cancer mortality in the control group will be

D_{c} = \sum_{i = 1}^{5} r (15 = i + 0.5) p + r (0.8 \times 9.5 + 0.6 \times 8.5 + 0.4 \times 7.5 + 0.2 \times 6.5) θ p = 62.5 rp + 17 θ r p

In the intervention arm, the overall mortality will be

D_{i} = \sum_{i = 1}^{5} r (15 - i + 0.5 + 0.8 \times 9.5 + 0.6 \times 8.5 + 0.4 \times 7.5 + 0.2 \times 6.5) θ p = 79.5 r θ p

as in design 3 above. Thus the observed relative risk will be

{RR}_{4} = \frac{79.5 r θ p}{62.5 rp + 17 r θ p} = \frac{62.5 θ + 17 θ}{62.5 + 17 θ}

Again, this is larger than θ. The extended follow-up of the Swedish Two-County, Gothenburg and Stockholm screening trials used this estimation strategy.^11,12

Fictitious example

Suppose we are screening for a cancer with annual incidence of 3 per thousand and unscreened case-fatality of 2%, and that the offer of screening reduces this (in real, non-overdiagnosed tumours) to 1.5%, ie. a true relative risk of θ = 0.75. Suppose also the same designs and lead time effects as above. These parameters are not dissimilar to those observed in some of the breast cancer screening trials.^12,13 The four options above would result in relative risk estimates of RR₁ = 0.75, RR₂ = 0.82, RR₃ = 0.95, and RR₄ = 0.79. Thus, the only unbiased design is the one where the screening period and the observation period are the same. The other designs are all conservatively biased, with the smallest bias being observed where an exit screen of the control group takes place.

Discussion

The examples above demonstrate that design expedients in screening trials that involve an observation period that exceeds the screening period all tend to be conservatively biased. In our example, the least biased design was number 4, that in which a single screen is offered to the control group at the end of the screening phase, and only cancers diagnosed in both groups up to the end of screening of the control group are followed up for cause-specific mortality during the observation period. This design may not always be the least conservatively biased. The order of inaccuracy will depend on the relative size of the mortality benefit and the magnitude of lead time effects. However, the fact that all designs with observation beyond the screening period are conservative is generalizable.

It should be noted that the design of cancer screening trials also has implications for estimation of lead times and overdiagnosis rates. This is not dealt with here, but is the subject of ongoing research.

Our conclusions about design number 4 also depend on the magnitude of lead time at the end of the screening period being equal in both groups, which, in turn, will depend on the amount of actual screening exposure in both groups just before the end of the screening period. There may be some mismatch of timing between the last screen of the intervention group and the single screen of the control group. A reasonable check on whether the lead time effects are equal is given by the difference between the groups with respect to cumulative incidence of cancer at the end of the control screen. If this difference is small, then the lead time effects are likely to be the same in each group, which should be the case if randomization was effective, and the conservative nature of design number 4 will hold.

The Swedish Two-County Trial is a case in point. Before the screen of the control group, there was an excess incidence in the intervention group.⁸ Incidence in both groups equalized immediately upon conclusion of the single screen of the control group.⁶

Concerns have been expressed in the past about the policy of a single screen of the control group at the end of the screening period in cancer screening trials.¹⁴ The examples above indicate that with incidence, fatality, and lead time parameters typical of the breast screening trials, this methodological approach is the least conservative of several design expedients, but still will tend to underestimate the benefit of screening.

Footnotes

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Moss

. Design issues in cancer screening trials. Stat Methods Med Res 2010; 19: 451–61.

Smith

Duffy

Gabe

Tabar

Yen

AMF

. Chen HHT. The randomized trials of breast cancer screening: what have we learned? Radiol Clin N Amer 2004; 42: 793–806.

Sasieni

Adams

Cuzick

. Benefit of cervical screening at different ages: evidence from the UK audit of screening histories. Br J Cancer 2003; 89: 88–93.

Atkin

Edwards

Kralj-Hans

Wooldrage

Hart

Northover

Parkin

Wardle

Duffy

Cuzick

. Once-only flexible sigmoidoscopy screening in prevention of colorectal cancer: a multicentre randomised controlled trial. Lancet 2010; 375: 1624–33.

Njor

Nyström

Moss

Paci

Broeders

Segnan

Lynge

. Euroscreen Working Group. Breast cancer mortality in mammographic screening in Europe: a review of incidence-based mortality studies. J Med Screen 2012; 19(1): Suppl): 33–41.

Duffy

Agbaje

Tabar

Vitak

Bjurstam

Björneld

Myles

Warwick

. Estimates of overdiagnosis from two trials of mammographic screening for breast cancer. Breast Cancer Research 2005; 7: 258–65.

Hardcastle

Chamberlain

Robinson

Moss

Amar

Balfour

James

. Mangham CM. Randomised controlled trial of faecal-occult-blood screening for colorectal cancer. Lancet 1996; 348: 1472–7.

Tabar

Fagerberg

Gad

. Reduction in mortality from breast cancer after mass screening with mammography: randomised trial from the Breast Cancer Screening Working Group of the Swedish National Board of Health and Welfare. Lancet 1985; i: 829–32.

Scholefiled

Moss

Mangham

Whynes

Hardcastle

. Nottingham trial of faecal occult blood testing for colorectal cancer: a 20-year follow-up. Gut 2012; 61: 1036–40.

10.

Shapiro

. Periodic screening for breast cancer: the HIP radnomized controlled trial. Monogr Natl Cancer Inst 1997; 22: 27–30.

11.

Tabar

Vitak

Chen

THH

Yen

AMF

Cohen

Tot

Chiu

SYH

Chen

SLS

Fann

JCY

Rosell

Fohlin

Smith

Duffy

. Swedish Two-County Trial: impact of mammographic screening on breast cancer mortality during three decades. Radiol 2011; 260: 658–63.

12.

Nyström

Andersson

Bjurstam

Frisell

Nordenskjöld

Rutqvist

. Long term effects of mammography screening: updated overview of the Swedish randomised trials. Lancet 2002; 359: 909–19.

13.

Duffy

Yen

Chen

Chiu

Fan

Smith

Vitak

Tabar

. Long term benefits of breast screening. Breast Cancer Management 2012; 1(1): 31–38.

14.

Gøtzsche

. Relation between breast cancer mortality and screening effectiveness: systematic review of the mammography trials. Dan Med Bull 2011; 58: A426–A426.