Abstract
Objectives
To investigate the consequences of different cancer screening trial designs and follow-up options for accuracy of the estimate of the effect of screening on disease-specific mortality.
Methods
We consider a randomized trial of breast cancer screening with a screening phase in which the intervention group is offered screening and the control group is not, and optional further follow-up after this screening phase. Postulating a lead time effect similar to that observed in breast cancer screening trials, we calculate the observed relative risk of disease-specific mortality and compare this with the true relative risk, for four design options: (1) no follow-up beyond the screening phase, ie. the screening phase and the observation period are identical; (2) follow-up continuing beyond the screening phase, all cancer-specific deaths counted, including those diagnosed after the screening phase; (3) follow-up continuing beyond the screening phase, but with only deaths from cancers diagnosed during the screening phase included; and (4) follow-up continuing beyond the screening phase, a single screen of the control group conducted at the end of the screening phase, and only deaths from cancers diagnosed during the screening phase in both arms up to completion of the single control screen included.
Results
All designs in which follow-up for mortality continues beyond the screening phase incurred a bias against screening. The design in which the control group undergoes a single screen at the end of the screening phase was least biased in the example used.
Conclusions
The expedient of a single screen of the control group at the end of the screening phase has acceptable accuracy, but is still slightly conservatively biased.
Introduction
Randomized trials of cancer screening can involve challenging design choices.1,2 Ideally, a population would be randomized to the offer of regular screening (intervention group) or to usual care (control group) for a long period of time. Deaths during that period from cancers diagnosed during that period would constitute the trial endpoint. We are here considering screening tests such as mammography or faecal occult blood testing, which are primarily aimed at detection of cancer at an earlier and more treatable stage, and not tests such as flexible sigmoidoscopy or cervical smear testing, which have a major effect of detection and removal of pre-malignancies, and therefore prevention of cancer, albeit while also detecting some cancers early.3,4
Often, research resources are not available for the screening to continue for long enough to have sufficient endpoint events for adequate statistical power. In this case, a number of options are available. One is to make screening available to the intervention group for a limited period of time, but to extend the period of observation beyond the screening phase. In this design, the intervention group is invited to screening during the earlier period, but not the later, and the control group is never invited to screening. This will entail a conservative bias, as in the latter period, after the screening phase has ended, neither the intervention nor the control group are receiving screening, so there will be no screening effect on deaths from the cancers diagnosed during this period.
A second possibility is to screen the intervention group for a limited period and, as in the previous example, never screen the control group. Mortality is measured in the follow-up period, but only deaths from cancers diagnosed during the screening phase are included. This too will be conservatively biased, as there will be deaths from cancers included in the intervention group that would have been diagnosed after the end of the screening phase if no screening had taken place, but were diagnosed during the screening phase as a result of lead time. 5 Their counterparts in the control group are diagnosed after the screening phase, due to the absence of lead time in this unscreened population, and are therefore excluded from the analysis.
A third option is to screen the intervention group for a limited period, and at the end of that period, offer one round of screening to the control group. In subsequent follow-up for cancer mortality, deaths are included from all cancers diagnosed in both groups between randomization and the end of the single round of screening in the control group. This design too will dilute the effect of screening, as it includes cancers diagnosed in a period when both groups are receiving the same screening intervention.
In the material following, we demonstrate these conservative biases more formally, with a numerical example.
Methods
Screening designs and endpoints
Suppose we propose a trial to screen for a cancer which, in the absence of screening, has annual incidence rate
If the screening works by diagnosing the cancer at an earlier stage, it must confer a lead time, ie. the screen-detected cancers are diagnosed some time before they would have been diagnosed in the absence of screening. Suppose that at the end of the screening phase, the intervention arm has an additional rate of 2
Design 1: Identical screening and observation periods
If our screening and observation periods are the same, say 15 years following randomization, in the control group, approximating time of diagnosis in any given year as the mid-point of that year, the observed overall rate of deaths from the relevant cancer will be
In the intervention group, it will be
Thus, this design will yield an unbiased estimated relative risk of
The first mortality report of the Nottingham trial of faecal occult blood testing (FOBT) for colorectal cancer used this design and analysis, 7 and the Swedish Two-county Trial’s first mortality results pertained mainly to the period before the control group was screened, and therefore correspond to this scenario. 8
Design 2: Extended observation period, all cancers and deaths included
If, on the other hand, screening is offered to the intervention group for the first five years, but cancers and deaths are observed in both groups up to fifteen years, we will have in the control group
This will give a relative risk of
This will exceed θ, and will therefore be biased against the screening effect. The primary analysis of extended follow-up of the Nottingham FOBT trial used this design and analysis. 9
Design 3: Extended observation period, including only deaths from cancers diagnosed during the screening phase
Now suppose that screening is offered to the intervention group for five years, but never to the control group, and both groups are followed up for 15 years, but only deaths from cancers diagnosed in each group during the first five years are included. In the control group, the overall rate of deaths from the relevant cancer will be
In the intervention group, the overall rate will be
and the observed relative risk will be
Again, this is greater than θ, and hence is conservatively biased. The analysis of extended follow-up of theHealth Insurance Plan of Greater New York Breast Screening Trial used this strategy. 10
Design 4: Extended observation period with an exit screen of the control group
In this design, screening is offered to the intervention group for five years, to the control group at the closure of the screening period, so that the single screening round of the control group concludes at the end of the fifth year, and only deaths in both groups from cancers diagnosed up to the end of the control group screen are included. The purpose of this design is to confer approximately the same number of additional lead time cases in the control group as in the intervention group. The overall cancer mortality in the control group will be
In the intervention arm, the overall mortality will be
as in design 3 above. Thus the observed relative risk will be
Again, this is larger than θ. The extended follow-up of the Swedish Two-County, Gothenburg and Stockholm screening trials used this estimation strategy.11,12
Fictitious example
Suppose we are screening for a cancer with annual incidence of 3 per thousand and unscreened case-fatality of 2%, and that the offer of screening reduces this (in real, non-overdiagnosed tumours) to 1.5%, ie. a true relative risk of θ = 0.75. Suppose also the same designs and lead time effects as above. These parameters are not dissimilar to those observed in some of the breast cancer screening trials.12,13 The four options above would result in relative risk estimates of
Discussion
The examples above demonstrate that design expedients in screening trials that involve an observation period that exceeds the screening period all tend to be conservatively biased. In our example, the least biased design was number 4, that in which a single screen is offered to the control group at the end of the screening phase, and only cancers diagnosed in both groups up to the end of screening of the control group are followed up for cause-specific mortality during the observation period. This design may not always be the least conservatively biased. The order of inaccuracy will depend on the relative size of the mortality benefit and the magnitude of lead time effects. However, the fact that all designs with observation beyond the screening period are conservative is generalizable.
It should be noted that the design of cancer screening trials also has implications for estimation of lead times and overdiagnosis rates. This is not dealt with here, but is the subject of ongoing research.
Our conclusions about design number 4 also depend on the magnitude of lead time at the end of the screening period being equal in both groups, which, in turn, will depend on the amount of actual screening exposure in both groups just before the end of the screening period. There may be some mismatch of timing between the last screen of the intervention group and the single screen of the control group. A reasonable check on whether the lead time effects are equal is given by the difference between the groups with respect to cumulative incidence of cancer at the end of the control screen. If this difference is small, then the lead time effects are likely to be the same in each group, which should be the case if randomization was effective, and the conservative nature of design number 4 will hold.
The Swedish Two-County Trial is a case in point. Before the screen of the control group, there was an excess incidence in the intervention group. 8 Incidence in both groups equalized immediately upon conclusion of the single screen of the control group. 6
Concerns have been expressed in the past about the policy of a single screen of the control group at the end of the screening period in cancer screening trials. 14 The examples above indicate that with incidence, fatality, and lead time parameters typical of the breast screening trials, this methodological approach is the least conservative of several design expedients, but still will tend to underestimate the benefit of screening.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
