Sage Journals: Discover world-class research

Abstract

Objective

In screening programmes there is recognized bias introduced through participant self-selection (the healthy screenee bias). Methods used to evaluate screening programmes include Intention-to-screen, per-protocol, and the “post hoc” approach in which, after introducing screening for everyone, the only evaluation option is participants versus non-participants. All methods are prone to bias through self-selection. We present an overview of approaches to correct for this bias.

Methods

We considered four methods to quantify and correct for self-selection bias. Simple calculations revealed that these corrections are actually all identical, and can be converted into each other. Based on this, correction factors for further situations and measures were derived. The application of these correction factors requires a number of assumptions.

Results

Using as an example the German Neuroblastoma Screening Study, no relevant reduction in mortality or stage 4 incidence due to screening was observed. The largest bias (in favour of screening) was observed when comparing participants with non-participants.

Conclusions

Correcting for bias is particularly necessary when using the post hoc evaluation approach, however, in this situation not all required data are available. External data or further assumptions may be required for estimation.

Keywords

Screening evaluation self selection healthy screenee bias correction factor Neuroblastoma

Introduction

The biases introduced through self-selection of participants in a screening programme have been described,^1–13 using terms such as “healthy volunteer bias” or “healthy screenee bias”. This differs from the “structural healthy screenee bias”, which describes the fact that only asymptomatic, disease free individuals are eligible for screening.¹⁴ If associated with outcome, (eg. mortality, advanced stage incidence, cumulative mortality or cumulative advanced stage incidence) self-selection of participants can significantly bias the results of a screening programme.

Several approaches have been used to evaluate screening programmes. Intention-to-screen (ITS) compares the study group with the control group, per-protocol (PP) compares the participants in the study group with the control group, and in the “post hoc” approach, after introducing screening for everyone, the only option is to compare participants versus non-participants. Post hoc evaluations lack a control group. Comparisons can be made as a factor (eg. Relative Risk, Rate Ratio, or Incidence Rate Ratio) or as respective difference measures (eg. Risk Difference, Rate Difference). The self-selection issue pertains to all of the study designs for screening evaluation. We here discuss four suggested methods^1–4 to quantify and correct for the bias introduced by each of these approaches. To our knowledge, this is the first attempt to present an overview of these approaches for all relevant situations.

We used as an example Neuroblastoma screening in Germany, tested in a large scale controlled study.^15–17 Participation was voluntary and, based on the result, neuroblastoma screening was not introduced into the German early childhood examination programme. The effect of screening on advanced stage incidence and mortality was negligible, while overdiagnosis was considerable. A study in Canada with a similar design came to the same conclusion for younger children.¹⁸

Methods

We deal here with a screening programme evaluation, where outcome measures are known for the study group, the participants in the study group and ideally a control group. The fraction of attendance in the study group is also known. The outcome in the non-participants is either known or can be derived. We assume no or negligible attendance in the control group. The common approaches for evaluation (ITS, PP, post hoc) are potentially biased due to participant self-selection.

Suggestions to correct for this bias have been made, among others, by McIntosh, Baker, Cuzick and Duffy.^1–4 McIntosh¹ presents an unbiased (“causal”) alternative to the ITS and PP approach, plus an “as treated” approach, when figures regarding spontaneous screening in the control group are known or relevant in a randomized controlled trial (RCT) or controlled trial setting. Baker² presents a correction factor for a difference measure under the ITS approach in an RCT setting, citing a number of earlier papers on the issue and calling McIntosh and Cuzick,⁴ a “related approach”. Duffy³ adapts the suggestion developed by Cuzick⁴ for an RCT setting to a post hoc case control setting, deriving an ITS result and a causal result (ie. unbiased result) from this by applying correction factors derived from published data. Simple calculations revealed that all three approaches are actually identical; they can be converted into each other (already noted for some in previous publications^2,13). Based on this, correction factors for further situations and measures are derived.

Define P_S, P_c, P_ST, P_SNT, as the respective outcome probabilities or rates for the screening group (S), the control group (C), participants in the screening group (ST), and non-participants in the screening group (SNT). If P_c ≠ P_SNT, self-selection is likely to be an issue. Define f_S (0 ≤ f_S ≤ 1) as the fraction of participants in the study group. The study group outcome is the mean of the participants and the non-participants weighted by the fraction of participants $P_{S} = P_{ST} f_{S} + P_{SNT} (1 - f_{S})$ , which can also be used to estimate the non-participants’ outcome, if only the total study group and the participants’ outcome was observed: $P_{SNT} = \frac{P_{S} - P_{ST} f_{S}}{1 - f_{S}}$ . McIntosh¹ defines a group of what could be called “virtual participants (VT)”: those in the control group, who would have attended, had the intervention been offered to them. Their effect is P_CVT, the corresponding non-participants’ P_CVNT. Comparing the participants in the study group with the virtual participants in the control group gives an estimate of the effect unbiased by self-selection, called a “causal effect measure”. The group of virtual participants cannot be observed directly. To estimate the outcome of this group, two assumptions must be made: The fraction of virtual participants in the control group f_C is the same as that in the study group: fc = fs; and the virtual non-participants in the control group P_CVNT have the same outcome as the observed study group non-participants P_SNT: P_CVNT = P_SNT. Via $P_{C} = P_{CVT} f_{S} + P_{SNT} (1 - f_{S})$ these assumptions lead to the simple estimate $P_{CVT} = \frac{P_{C} - P_{SNT} (1 - f_{S})}{f_{S}}$ . (For the practical process see the appendix, available online.)

Study results can be estimated using the approaches in Table 1. Taking the causal effect estimate described above as the gold standard (see Table 1, top line), appropriate correction factors can directly be derived (Table 2). Those described before are indicated by references; the others were derived by comparing the biased effect measures with the causal effect. Multiplying a biased effect estimate by the respective correction factor yields the causal estimate. The correction factors require obtaining an estimate for P_S (or P_SNT), P_ST, f_S and P_C.

Table 1.

Evaluation approaches to a screening programme.

Evaluation approach	Relative comparison (RR,IRR)	Absolute comparison (Rate difference, percentage points)
Causal (unbiased)¹	P_ST/P_CVT	P_ST - P_CVT
Intention to screen (ITS)	P_S/P_C	P_S - P_C
Per protocol (PP)	P_ST/P_C	P_ST - P_C
post hoc (no control group)	P_ST/P_SNT	P_ST - P_SNT

P: Outcome measure, such as a mortality rate; S: Study group; C: Control group; T: Participant; NT: Non-participant; V: Virtual; RR: Rate Ratio, Risk Ratio; IRR: Incidence Rate Ratio.

Table 2.

Correction factors for biased evaluation approaches derived from references^1–4

Evaluation approach	Result measure	Correction factor to convert to causal/ unbiased result	Suggested by
Intention to screen (ITS)	P_S/P_C	$\frac{P_{C} P_{ST}}{P_{S}} \frac{f_{S}}{(P_{C} - (1 - f_{S}) P_{SNT})}$	(Duffy³)*
	P_S - P_C	$\frac{1}{f_{S}}$	Baker²
Per protocol (PP)	P_ST/P_C	$P_{C} \frac{f_{S}}{(P_{C} - (1 - f_{S}) P_{SNT})}$
	P_ST - P_C	$\frac{1}{f_{S}} \frac{P_{S} - P_{C}}{(P_{ST} - P_{C})}$
post hoc (no control group)	P_ST/P_SNT	$P_{SNT} \frac{f_{S}}{(P_{C} - (1 - f_{S}) P_{SNT})}$	Cuzick,⁴ Duffy³
	P_ST - P_SNT	$\frac{1}{f_{S}} \frac{P_{S} - P_{C}}{(P_{ST} - P_{SNT})}$

P: Outcome measure such as a mortality rate; S: Study group; C: Control group; T: Participant; NT: Non-participant; V: Virtual.

Indirectly: the conversion from post hoc to causal is presented, as well as the conversion from post hoc to ITS.

Simple 95%-confidence intervals for the result measures can be determined using the Poisson or Binomial distribution, as needed. They do not account for, eg. the uncertainty in the estimation of the “virtual participants” outcome and may be too narrow. They are not to be read as statistical tests, but as indicators of the size of variation expected, given the number of events and the population size.

The German Neuroblastoma Screening Study^15–17 offered screening once, from May 1995 to early 2001, to all children in selected German states (about half the German population), at around their first birthday (age 10–24 months). The groups were thus defined as areas in this non-RCT study. The screening and control areas were comparable, prior to the screening study, with respect to incidence, stage and age distribution, and mortality.¹⁵ About 2.1 million eligible children living in the control area were not invited to participate, about 2.6 million in the screening area were invited, and about 1.5 million of these participated. We analysed the birth cohorts 1994–1999. Participation in the 1994 birth cohort was 35%; with increased publicity, the last two birth cohorts achieved about 65% participation. Publicity was kept local, to avoid too much spontaneous screening in neighboring areas. On average, about 57.5% of all invited children participated. Less than 3000 children from the control area participated (0.1 %). Identifying information of participants was encrypted and matched with data from the German Childhood Registry,¹⁹ enabling the identification of true positive and false negative cases, as well as incidence and mortality by age, period, stage, birth cohort, study area, control area, and participation status. Mortality follow-up, using both active and passive follow-up, as routinely performed by the German Childhood Cancer Registry, is relatively complete for the first five years after diagnosis (later deaths are very rare for Neuroblastoma).

The main outcome was mortality. We also examined stage 4 incidence (which has a particularly bad prognosis and accounts for the majority of deaths from Neuroblastoma), and other measures not presented here. Case ascertainment and mortality follow-up for the study cohorts are available as of December 2013. The outcome measures are cumulative mortality and cumulative stage 4 incidence from screening age until the maximum lead-time: 2nd–6th year of life. The method for estimating the maximum lead time and overdiagnosis, which defines the optimal case ascertainment period for this study, was presented elsewhere.²⁰

Results

For the Neuroblastoma study, we chose risk differences as the outcome measure. Both mortality and stage 4 incidence in non-participants were noticeably higher than in the control area, suggesting possible self-selection bias (Table 3). The results for the post hoc approach are not relevant for this study, as a control group (control area) was available, and are presented here only for the completeness of the worked example. Table 4 shows the effect of the screening. The causal effect is the one without self-selection bias. Multiplying the ITS, PP, or post hoc effects by their respective correction factors yields the causal effect.

Table 3.

Outcomes: cumulative mortality and cumulative stage 4-incidence per million children under risk (absolute number of cases) in the defined population subgroups in the German Neuroblastoma Screening Study for the birth cohorts 1994–1999, diagnosed between the 2^nd and 6^th year of life

	Screening area			Control area
Outcome	All of screening area (S) Mortality/ incidence (N)	Participants (ST) Mortality/ incidence (N)	Non-Participants (SNT) Mortality/ incidence (N)	All of control area (C) Mortality/ incidence (N)	Virtual participants (CVT) Mortality/ incidence (N)	Virtual Non-participants (CVNT) Mortality/ incidence (N)
Mortality	35.3 (91)	31.0 (46)	41.0 (45)	37.8 (80)	35.4 (43.3)	41.0 (36.7)
Stage 4 incidence	50.0 (129)	45.2 (67)	56.5 (62)	50.1 (106)	45.4 (55.5)	56.5 (50.5)

Follow-up until end of 2013.

Table 4.

Outcomes effects and correction factors for cumulative mortality and cumulative stage 4-incidence per million children under risk in the defined population subgroups in the German Neuroblastoma Screening Study for the birth cohorts 1994–1999, diagnosed between the 2^nd and 6^th year of life. Multiplying the approaches ITS, PP and post hoc with their respective corrections factors yields the causal effect.

Evaluation approach	Absolute comparison: Mortality reduction (Difference of cumulative mortality) with 95%-confidence interval	Correction factor	Absolute comparison: Stage IV Incidence reduction (Difference of cumulative incidence) with 95%-confidence interval	Correction factor
Causal effect	−4.4 [−16.5; 8.5]	–	−0.2 [−14.7;14.4]	–
Intention to screen (ITS)	−2.5 [−12.2;7.6]	1.74	−0.1 [−11.7;11.5]	1.74
Per protocol (PP)	−6.8 [−22.3;6.4]	0.65	−4.9 [−22.7;11.2]	0.04
post hoc (no control group)	−10.0 [−4.2;21.2]	0.44	−11.4 [−24.7;5.4]	0.02

The ITS approach slightly underestimates the mortality effect and almost correctly estimates the stage 4 effect (which is almost zero), while the per protocol approach overestimates the mortality effect slightly and the stage 4 effect considerably (Table 4). For both outcomes, the post hoc approach would have overestimated the effects considerably, as expected. However, all confidence intervals include zero. The overall conclusion is that no relevant reduction in mortality or stage 4 incidence due to screening was observed.

Discussion

Non-attendees for screening programmes have been shown to differ in several respects from attendees, including in employment, immigrant, and income status.^5,6,11,12 In the prostate, lung, colorectal, and ovarian cancer screening trial,⁷ mortality from all other causes (not the cancers included) in participants was about half that in the general population. Comorbidity with other (often unrelated) chronic conditions is often associated with non-participation in screening or other preventive activities, an effect found in a prostate cancer screening trial¹⁰ and in the United Kingdom Collaborative Trial of Ovarian Cancer Screening,²¹ where the difference was associated with social status.

The German Neuroblastoma screening study was controlled, but randomization was not feasible. We were able to show comparability of the study area and the control area with respect to all relevant variables, particularly the outcome variables, in the years directly preceding the screening intervention.¹⁵ All evaluation approaches demonstrated no favourable effect of screening. The ITS approach led to the least biased result. A post hoc evaluation using the study area only, comparing participants with non-participants, would have led to a considerable (though still not significant) overestimation of the effect. We observed that children from non-participating families had less favourable stages and higher mortality.

In Germany almost every child has health insurance, and the screening study was attached to one of a series of free general examinations offered to all children until age 6, in which 90% participated in the late 1990s (Gesundheitsberichterstattung des Bundes). Pediatricians were provided with posters, and asked to hand leaflets and test kits to parents, and to relate test results back to them later, but they were not compensated, and some refused to participate. Participation increased after extra publicity efforts, when parents began to request the test from pediatricians. Some non-participation will have been related to not attending the free examinations. Non-participation due to pediatrician’s refusal could be non-differential with respect to parental attitudes. The bias due to self-selection may possibly be underestimated.

All the considered approaches to correct for selection bias^1–4 require certain prerequisites in order to be applicable. McIntosh¹ stresses the importance of knowing the maximum lead time for such comparisons. We base the method for estimating the maximum lead time in the neuroblastoma study on that of Etzioni and Self,²² presented in detail in our previous study.²⁰ Ascertainment periods that are too short lead to a bias, probably in favour of screening, and periods that are too long lead to a bias probably against screening for relative effect measures, and no bias (though confidence intervals may have been too small) for difference effect measures.²² McIntosh also requires the study and control groups to be comparable in all (other) relevant aspects (the “exclusion restriction”). The assumption that both groups would have had the same outcome if screening had not been introduced for the study group underlies all methods presented here. In an RCT, this assumption is justified. In all other settings, corroborating evidence should be sought.

Kalager et al.⁹ argue that it is also vital to account for self-selection when estimating overdiagnosis bias. The method for estimating overdiagnosis in the Neuroblastoma Screening trial was presented in our previous study,²⁰ and the correction methods presented here can be applied.

Baker,² who presented methods for the design and analysis of screening RCTs, noted the dependence of the correction factors on the two assumptions made to calculate them (ie. assuming that i) the participation in the control group and ii) the outcome in those in the control group, who would have been non-participants had screening been offered to them, would have been the same as observed in the intervention group), which are generally reasonable, but not or not always verifiable.

Duffy³ adapted the Cuzick⁷ method originally developed for RCT settings, where reliable estimates of of P_C and P_SNT (ie. for the comparison group and the non-participants) are available, to a post hoc setting. Rewriting the correction factor formula, it seems not so much to depend on P_C and P_SNT as such, but on the ratio P_SNT/P_C (named the “self selection bias parameter” by Duffy). Duffy stressed the importance of obtaining reliable estimates of P_C and P_SNT, and used values from published RCTs. However, we consider it important for the elements of the correction factors to be as directly derived from the data in question as possible. Among other things, the P_SNT/P_C ratio depends on the participation fraction, as a smaller non-participant group would probably be more different from the population average than a larger fraction. It cannot be entirely unproblematic to carry this ratio over from a different study, even if this other study is a well-designed and conducted RCT. Paap²³ has shown that the P_SNT/P_C ratio can differ considerably in subgroups of a screening programme. In a post hoc evaluation, an estimate for P_SNT must be available, so it seems natural to use it. Researchers in a post hoc setting should attempt to obtain an estimate for P_C based on data directly related to the study data. If this key parameter cannot be estimated, then a correction is not possible. It may, however, still be possible to gain an insight into the magnitude of the bias by selecting a range of reasonable scenarios for P_C and estimating a range of correction factors based on these.

A controlled design, randomized or not, is preferable to an uncontrolled one as it permits estimation of the self-selection effect size and correction for this, using the respective appropriate correction factor. Post hoc studies require such an estimate just as much, but it is far more difficult to obtain.

Footnotes

Acknowledgements

We thank Frau Irene Jung for extensive support in preparing the database, and two major co-investigators (now retired) of the original Screening Trial, Professors Rudolf Erttmann and Johannes Sander.

Declaration of interests

All authors declare no conflict of interest.

Funding

The Neuroblastoma Screening Study (Modellprojekt) was funded by a grant (70/365) from Deutsche Krebshilfe (German Cancer Aid Foundation), by the German Consortium of Statutory Health Insurance Associations (Arbeitsgemeinschaft der Spitzenverbände der Gesetzlichen Krankenkassen, den Verband der privaten Krankenversicherung, den Gesundheitsministerien der Länder Baden-Württemberg, Bremen, Hamburg, Niedersachsen, Nordrhein-Westfalen und Schleswig-Holstein), and by a grant (IDF 239.08) from the Federal Ministry of Health, Bonn, Germany.

Author funding

CS: Employed by the German Childhood Cancer Registry, jointly funded by German Ministries of Health (federal and state). FB is retired. BH: Employed by the University Children’s Hospital, Pediatric Oncology/Hematology, Koeln/Germany. JM is retired. FHS works as attending physician at Klinikum Stuttgart and was the former secretary of the German Neuroblastoma Screening Program.

Ethics Committee

The Neuroblastoma Screening Study was reviewed and approved by the state ethics committee of the German Medical Association in the state of Baden-Württemberg (Ethikkommission der Landesärztekammer Baden-Württemberg) 08.02.1995, Az 102/94.

References

McIntosh

. Instrumental variables when evaluating screening trials: estimating the benefit of detecting cancer by screening. Statistics in medicine 1999; 18: 2775–94.

Baker

Kramer

Prorok

. Statistical issues in randomized trials of cancer screening. BMC medical research methodology 2002; 2: 11–11.

Duffy

Cuzick

Tabar

Chen

THH

Yen

Smith

. Correcting for non-compliance bias in case-control studies to evaluate cancer screening programmes. Appl Stat 2002; 51: 235–43.

Cuzick

Edwards

Segnan

. Adjusting for non-compliance and contamination in randomized clinical trials. Statistics in medicine 1997; 16: 1017–29.

Lagerlund

Maxwell

Bastani

Thurfjell

Ekbom

Lambe

. Sociodemographic predictors of non-attendance at invitational mammography screening–a population-based register study (Sweden). Cancer causes & control: CCC 2002; 13: 73–82.

Singh

Paszat

Vinden

Rabeneck

. Association of socioeconomic status and receipt of colorectal cancer investigations: a population-based retrospective cohort study. CMAJ: Canadian Medical Association journal = journal de l'Association medicale canadienne 2004; 171: 461–5.

Pinsky

Miller

Kramer

. Evidence of a healthy volunteer effect in the prostate, lung, colorectal, and ovarian cancer screening trial. American journal of epidemiology 2007; 165: 874–81.

Croswell

Ransohoff

Kramer

. Principles of cancer screening: lessons from history and study design issues. Seminars in oncology 2010; 37: 202–15.

Kalager

Loberg

Fonnebo

Bretthauer

. Failure to account for selection-bias. International journal of cancer Journal international du cancer 2013; 133: 2751–3.

10.

Kranse

van Leeuwen

Hakulinen

. Excess all-cause mortality in the evaluation of a screening trial to account for selective participation. Journal of medical screening 2013; 20: 39–45.

11.

Lagerlund

Drake

Wirfalt

Sontrop

Zackrisson

. Health-related lifestyle factors and mammography screening attendance in a Swedish cohort study. European journal of cancer prevention: the official journal of the European Cancer Prevention Organisation (ECP) 2014.

12.

Lagerlund

Sontrop

Zackrisson

. Psychosocial factors and attendance at a population-based mammography screening program in a cohort of Swedish women. BMC women's health 2014; 14: 33–33.

13.

Swedish Organised Service Screening Evaluation G. Reduction in breast cancer mortality from the organised service screening with mammography: 2. Validation with alternative analytic methods. Cancer epidemiology, biomarkers & prevention: a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2006; 15: 52–6.

14.

Flanders

Longini

Jr . Estimating benefits of screening from observational cohort studies. Statistics in medicine 1990; 9: 969–80.

15.

Schilling

Spix

Berthold

. Neuroblastoma screening at one year of age. The New England journal of medicine 2002; 346: 1047–53.

16.

Schilling

Spix

Berthold

. German neuroblastoma mass screening study at 12 months of age: Statistical aspects and preliminary results. Medical and pediatric oncology 1998; 31: 435–41.

17.

Schilling

Spix

Berthold

. Children may not benefit from neuroblastoma screening at 1 year of age. Updated results of the population based controlled trial in Germany. Cancer letters 2003; 197: 19–28.

18.

Woods

Gao

Shuster

. Screening of infants and mortality due to neuroblastoma. The New England journal of medicine 2002; 346: 1041–6.

19.

Kaatsch

. The German Childhood Cancer Registry two decades after its inception. Monatsschr Kinderh 2002; 150: 966–+.

20.

Spix

Michaelis

Berthold

Erttmann

Sander

Schilling

. Lead-time and overdiagnosis estimation in neuroblastoma screening. Statistics in medicine 2003; 22: 2877–92.

21.

Burnell

Gentry-Maharaj

Ryan

. Impact on mortality and cancer incidence rates of using random invitation from population registers for recruitment to trials. Trials 2011; 12: 61–61.

22.

Etzioni

Self

. On the Catch-up Time Method for Analyzing Cancer Screening Trials. Biometrics 1995; 51: 31–43.

23.

Paap

Verbeek

Puliti

Broeders

Paci

. Minor influence of self-selection bias on the effectiveness of breast cancer screening in case-control studies in the Netherlands. Journal of medical screening 2011; 18: 142–6.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.19 MB

Correction factors for self-selection when evaluating screening programmes

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Methods

Results

Discussion

Footnotes

Acknowledgements

Declaration of interests

Funding

Author funding

Ethics Committee

References

Supplementary Material