Abstract
Objective
In screening programmes there is recognized bias introduced through participant self-selection (the healthy screenee bias). Methods used to evaluate screening programmes include Intention-to-screen, per-protocol, and the “post hoc” approach in which, after introducing screening for everyone, the only evaluation option is participants versus non-participants. All methods are prone to bias through self-selection. We present an overview of approaches to correct for this bias.
Methods
We considered four methods to quantify and correct for self-selection bias. Simple calculations revealed that these corrections are actually all identical, and can be converted into each other. Based on this, correction factors for further situations and measures were derived. The application of these correction factors requires a number of assumptions.
Results
Using as an example the German Neuroblastoma Screening Study, no relevant reduction in mortality or stage 4 incidence due to screening was observed. The largest bias (in favour of screening) was observed when comparing participants with non-participants.
Conclusions
Correcting for bias is particularly necessary when using the post hoc evaluation approach, however, in this situation not all required data are available. External data or further assumptions may be required for estimation.
Introduction
The biases introduced through self-selection of participants in a screening programme have been described,1–13 using terms such as “healthy volunteer bias” or “healthy screenee bias”. This differs from the “structural healthy screenee bias”, which describes the fact that only asymptomatic, disease free individuals are eligible for screening. 14 If associated with outcome, (eg. mortality, advanced stage incidence, cumulative mortality or cumulative advanced stage incidence) self-selection of participants can significantly bias the results of a screening programme.
Several approaches have been used to evaluate screening programmes. Intention-to-screen (ITS) compares the study group with the control group, per-protocol (PP) compares the participants in the study group with the control group, and in the “post hoc” approach, after introducing screening for everyone, the only option is to compare participants versus non-participants. Post hoc evaluations lack a control group. Comparisons can be made as a factor (eg. Relative Risk, Rate Ratio, or Incidence Rate Ratio) or as respective difference measures (eg. Risk Difference, Rate Difference). The self-selection issue pertains to all of the study designs for screening evaluation. We here discuss four suggested methods1–4 to quantify and correct for the bias introduced by each of these approaches. To our knowledge, this is the first attempt to present an overview of these approaches for all relevant situations.
We used as an example Neuroblastoma screening in Germany, tested in a large scale controlled study.15–17 Participation was voluntary and, based on the result, neuroblastoma screening was not introduced into the German early childhood examination programme. The effect of screening on advanced stage incidence and mortality was negligible, while overdiagnosis was considerable. A study in Canada with a similar design came to the same conclusion for younger children. 18
Methods
We deal here with a screening programme evaluation, where outcome measures are known for the study group, the participants in the study group and ideally a control group. The fraction of attendance in the study group is also known. The outcome in the non-participants is either known or can be derived. We assume no or negligible attendance in the control group. The common approaches for evaluation (ITS, PP, post hoc) are potentially biased due to participant self-selection.
Suggestions to correct for this bias have been made, among others, by McIntosh, Baker, Cuzick and Duffy.1–4 McIntosh 1 presents an unbiased (“causal”) alternative to the ITS and PP approach, plus an “as treated” approach, when figures regarding spontaneous screening in the control group are known or relevant in a randomized controlled trial (RCT) or controlled trial setting. Baker 2 presents a correction factor for a difference measure under the ITS approach in an RCT setting, citing a number of earlier papers on the issue and calling McIntosh and Cuzick, 4 a “related approach”. Duffy 3 adapts the suggestion developed by Cuzick 4 for an RCT setting to a post hoc case control setting, deriving an ITS result and a causal result (ie. unbiased result) from this by applying correction factors derived from published data. Simple calculations revealed that all three approaches are actually identical; they can be converted into each other (already noted for some in previous publications2,13). Based on this, correction factors for further situations and measures are derived.
Define PS, Pc, PST, PSNT, as the respective outcome probabilities or rates for the screening group (S), the control group (C), participants in the screening group (ST), and non-participants in the screening group (SNT). If Pc ≠ PSNT, self-selection is likely to be an issue. Define fS (0 ≤ fS ≤ 1) as the fraction of participants in the study group. The study group outcome is the mean of the participants and the non-participants weighted by the fraction of participants
Evaluation approaches to a screening programme.
P: Outcome measure, such as a mortality rate; S: Study group; C: Control group; T: Participant; NT: Non-participant; V: Virtual; RR: Rate Ratio, Risk Ratio; IRR: Incidence Rate Ratio.
P: Outcome measure such as a mortality rate; S: Study group; C: Control group; T: Participant; NT: Non-participant; V: Virtual.
Indirectly: the conversion from post hoc to causal is presented, as well as the conversion from post hoc to ITS.
Simple 95%-confidence intervals for the result measures can be determined using the Poisson or Binomial distribution, as needed. They do not account for, eg. the uncertainty in the estimation of the “virtual participants” outcome and may be too narrow. They are not to be read as statistical tests, but as indicators of the size of variation expected, given the number of events and the population size.
The German Neuroblastoma Screening Study15–17 offered screening once, from May 1995 to early 2001, to all children in selected German states (about half the German population), at around their first birthday (age 10–24 months). The groups were thus defined as areas in this non-RCT study. The screening and control areas were comparable, prior to the screening study, with respect to incidence, stage and age distribution, and mortality. 15 About 2.1 million eligible children living in the control area were not invited to participate, about 2.6 million in the screening area were invited, and about 1.5 million of these participated. We analysed the birth cohorts 1994–1999. Participation in the 1994 birth cohort was 35%; with increased publicity, the last two birth cohorts achieved about 65% participation. Publicity was kept local, to avoid too much spontaneous screening in neighboring areas. On average, about 57.5% of all invited children participated. Less than 3000 children from the control area participated (0.1 %). Identifying information of participants was encrypted and matched with data from the German Childhood Registry, 19 enabling the identification of true positive and false negative cases, as well as incidence and mortality by age, period, stage, birth cohort, study area, control area, and participation status. Mortality follow-up, using both active and passive follow-up, as routinely performed by the German Childhood Cancer Registry, is relatively complete for the first five years after diagnosis (later deaths are very rare for Neuroblastoma).
The main outcome was mortality. We also examined stage 4 incidence (which has a particularly bad prognosis and accounts for the majority of deaths from Neuroblastoma), and other measures not presented here. Case ascertainment and mortality follow-up for the study cohorts are available as of December 2013. The outcome measures are cumulative mortality and cumulative stage 4 incidence from screening age until the maximum lead-time: 2nd–6th year of life. The method for estimating the maximum lead time and overdiagnosis, which defines the optimal case ascertainment period for this study, was presented elsewhere. 20
Results
Outcomes: cumulative mortality and cumulative stage 4-incidence per million children under risk (absolute number of cases) in the defined population subgroups in the German Neuroblastoma Screening Study for the birth cohorts 1994–1999, diagnosed between the 2nd and 6th year of life
Follow-up until end of 2013.
Outcomes effects and correction factors for cumulative mortality and cumulative stage 4-incidence per million children under risk in the defined population subgroups in the German Neuroblastoma Screening Study for the birth cohorts 1994–1999, diagnosed between the 2nd and 6th year of life. Multiplying the approaches ITS, PP and post hoc with their respective corrections factors yields the causal effect.
The ITS approach slightly underestimates the mortality effect and almost correctly estimates the stage 4 effect (which is almost zero), while the per protocol approach overestimates the mortality effect slightly and the stage 4 effect considerably (Table 4). For both outcomes, the post hoc approach would have overestimated the effects considerably, as expected. However, all confidence intervals include zero. The overall conclusion is that no relevant reduction in mortality or stage 4 incidence due to screening was observed.
Discussion
Non-attendees for screening programmes have been shown to differ in several respects from attendees, including in employment, immigrant, and income status.5,6,11,12 In the prostate, lung, colorectal, and ovarian cancer screening trial, 7 mortality from all other causes (not the cancers included) in participants was about half that in the general population. Comorbidity with other (often unrelated) chronic conditions is often associated with non-participation in screening or other preventive activities, an effect found in a prostate cancer screening trial 10 and in the United Kingdom Collaborative Trial of Ovarian Cancer Screening, 21 where the difference was associated with social status.
The German Neuroblastoma screening study was controlled, but randomization was not feasible. We were able to show comparability of the study area and the control area with respect to all relevant variables, particularly the outcome variables, in the years directly preceding the screening intervention. 15 All evaluation approaches demonstrated no favourable effect of screening. The ITS approach led to the least biased result. A post hoc evaluation using the study area only, comparing participants with non-participants, would have led to a considerable (though still not significant) overestimation of the effect. We observed that children from non-participating families had less favourable stages and higher mortality.
In Germany almost every child has health insurance, and the screening study was attached to one of a series of free general examinations offered to all children until age 6, in which 90% participated in the late 1990s (Gesundheitsberichterstattung des Bundes). Pediatricians were provided with posters, and asked to hand leaflets and test kits to parents, and to relate test results back to them later, but they were not compensated, and some refused to participate. Participation increased after extra publicity efforts, when parents began to request the test from pediatricians. Some non-participation will have been related to not attending the free examinations. Non-participation due to pediatrician’s refusal could be non-differential with respect to parental attitudes. The bias due to self-selection may possibly be underestimated.
All the considered approaches to correct for selection bias1–4 require certain prerequisites in order to be applicable. McIntosh 1 stresses the importance of knowing the maximum lead time for such comparisons. We base the method for estimating the maximum lead time in the neuroblastoma study on that of Etzioni and Self, 22 presented in detail in our previous study. 20 Ascertainment periods that are too short lead to a bias, probably in favour of screening, and periods that are too long lead to a bias probably against screening for relative effect measures, and no bias (though confidence intervals may have been too small) for difference effect measures. 22 McIntosh also requires the study and control groups to be comparable in all (other) relevant aspects (the “exclusion restriction”). The assumption that both groups would have had the same outcome if screening had not been introduced for the study group underlies all methods presented here. In an RCT, this assumption is justified. In all other settings, corroborating evidence should be sought.
Kalager et al. 9 argue that it is also vital to account for self-selection when estimating overdiagnosis bias. The method for estimating overdiagnosis in the Neuroblastoma Screening trial was presented in our previous study, 20 and the correction methods presented here can be applied.
Baker, 2 who presented methods for the design and analysis of screening RCTs, noted the dependence of the correction factors on the two assumptions made to calculate them (ie. assuming that i) the participation in the control group and ii) the outcome in those in the control group, who would have been non-participants had screening been offered to them, would have been the same as observed in the intervention group), which are generally reasonable, but not or not always verifiable.
Duffy 3 adapted the Cuzick 7 method originally developed for RCT settings, where reliable estimates of of PC and PSNT (ie. for the comparison group and the non-participants) are available, to a post hoc setting. Rewriting the correction factor formula, it seems not so much to depend on PC and PSNT as such, but on the ratio PSNT/PC (named the “self selection bias parameter” by Duffy). Duffy stressed the importance of obtaining reliable estimates of PC and PSNT, and used values from published RCTs. However, we consider it important for the elements of the correction factors to be as directly derived from the data in question as possible. Among other things, the PSNT/PC ratio depends on the participation fraction, as a smaller non-participant group would probably be more different from the population average than a larger fraction. It cannot be entirely unproblematic to carry this ratio over from a different study, even if this other study is a well-designed and conducted RCT. Paap 23 has shown that the PSNT/PC ratio can differ considerably in subgroups of a screening programme. In a post hoc evaluation, an estimate for PSNT must be available, so it seems natural to use it. Researchers in a post hoc setting should attempt to obtain an estimate for PC based on data directly related to the study data. If this key parameter cannot be estimated, then a correction is not possible. It may, however, still be possible to gain an insight into the magnitude of the bias by selecting a range of reasonable scenarios for PC and estimating a range of correction factors based on these.
A controlled design, randomized or not, is preferable to an uncontrolled one as it permits estimation of the self-selection effect size and correction for this, using the respective appropriate correction factor. Post hoc studies require such an estimate just as much, but it is far more difficult to obtain.
Footnotes
Acknowledgements
We thank Frau Irene Jung for extensive support in preparing the database, and two major co-investigators (now retired) of the original Screening Trial, Professors Rudolf Erttmann and Johannes Sander.
Declaration of interests
All authors declare no conflict of interest.
Funding
The Neuroblastoma Screening Study (Modellprojekt) was funded by a grant (70/365) from Deutsche Krebshilfe (German Cancer Aid Foundation), by the German Consortium of Statutory Health Insurance Associations (Arbeitsgemeinschaft der Spitzenverbände der Gesetzlichen Krankenkassen, den Verband der privaten Krankenversicherung, den Gesundheitsministerien der Länder Baden-Württemberg, Bremen, Hamburg, Niedersachsen, Nordrhein-Westfalen und Schleswig-Holstein), and by a grant (IDF 239.08) from the Federal Ministry of Health, Bonn, Germany.
Author funding
CS: Employed by the German Childhood Cancer Registry, jointly funded by German Ministries of Health (federal and state). FB is retired. BH: Employed by the University Children’s Hospital, Pediatric Oncology/Hematology, Koeln/Germany. JM is retired. FHS works as attending physician at Klinikum Stuttgart and was the former secretary of the German Neuroblastoma Screening Program.
Ethics Committee
The Neuroblastoma Screening Study was reviewed and approved by the state ethics committee of the German Medical Association in the state of Baden-Württemberg (Ethikkommission der Landesärztekammer Baden-Württemberg) 08.02.1995, Az 102/94.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
