Sage Journals: Discover world-class research

Abstract

Introduction

Objectives When evaluating screening for the early detection of cancer, it is important to estimate both harms and benefits. One common harm is a false-positive (FP), which is a positive screening result, perhaps followed by an invasive test, with no cancer detected on the diagnostic work-up or within a specified time period. An important goal is to estimate the risk of at least one FP, which we call the cumulative risk of an FP, if persons took a regimen of various screening tests, as is commonly recommended. The estimation is complicated because the data come from a study in which subjects are offered various screening tests in rounds with some missing tests in most subjects. Previous methods for estimating cumulative risk of FPs with a single type of test are not directly applicable, so a new approach was developed.

Methods

The tests were ordered by appearance, where the last test was either the first FP (analogous to a failure time) or the last test taken with no FPs having occurred on that test or previously (analogous to a censoring time). We applied a Kaplan-Meier approach for survival analysis with the innovation that the hazard for a first FP for a given test depends on the type of test and number of previous tests of that type which were taken.

Results

The method is illustrated with data from the screening arm of the randomized Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. With an FP defined as a diagnostic work-up in the absence of cancer (or advanced adenoma) within three years, the probability of at least one FP among 14 tests in men was 60.5% with 95% confidence interval of (59.3%, 61.6%).

Conclusion

A simple estimate is proposed for the probability of at least one FP if persons took a regimen of multiple screening tests of different types. The methodology is useful for summarizing the burden of multiphasic screening programmes.

Background

Screening persons for the early detection of cancer has potential benefits, but also some inevitable harms. One type of harm is a false-positive (FP) result, which is a test indicating the presence of cancer that is not confirmed by diagnostic work-up or by a later appearance of cancer within a few years. The problem of FP results is compounded as more recommendations are made for various types of cancer screening, each at regular intervals. To appreciate this harm, an important issue is to estimate the probability of a least one FP, which we call the cumulative risk of an FP, if persons embark on a recommended regimen of multiple screening tests. The estimation is challenging because most subjects were missing at least one screening test in the data used for estimation.

Methods have been developed to estimate the risk of at least one FP in a regimen of a single type of screening test.^1–5 There are two ways to directly apply methods for a single screening modality to multiple types of tests. One way is to separately model the FP rate for each type of test. However, this approach assumes independence of FP rates among different types of tests, which is not likely to hold for two tests for the same cancer (such as digital rectal exam [DRE] and prostate-specific antigen [PSA] for early detection of prostate cancer) or when a single risk factor (e.g. smoking) predisposes to FP tests for more than one cancer. A second approach is to list all tests (regardless of type) by order of appearance in each subject and invoke the methodology for a single type of test.⁶ The problem is that, because of the missing data, a particular ‘test’ is a composite of different types of tests, which complicates the interpretation of assumptions and results.

We propose a novel approach that avoids the aforementioned difficulties by treating the type of test and the number of previous tests of that type as a time-varying covariate in a survival framework. A related survival approach that uses the type of test and testing round as a time-varying covariate was applied in a companion paper,⁶ but without statistical justification.

Methods

We describe the method in terms of the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial, which randomized 154,935 persons aged 55-74 between 1993 and 2001. Our goal was to estimate the probability of at least one FP in 14 tests for men if no tests were missing. The data available for our analysis consisted of screens at rounds (yearly clinical visits) 0, 1, 2, 3, which correspond to years since randomization among 38,349 men⁶ in the screening arm of the trial. Four screening tests were offered to men in the trial: PSA and DRE for early detection of prostate cancer, postero-anterior view of chest X-rays for early detection of lung cancer and 60-cm flexible sigmoidoscopy (FSG) for early detection of colorectal cancer. For PSA, X-ray and DRE screening, we defined an FP as a positive test (i.e. suspicious for cancer) and no cancer detected within three years. For FSG screening, we defined an FP as a positive test and no advanced adenoma (namely no adenoma greater than or equal to 1 cm, no severe dysplasia and no villous components) detected within three years. (In the definition of FP used here, some subjects with an FP had an invasive work-up and others did not.)

Within rounds 0 and 3, FSG came last. Among men, blood samples for PSA were generally taken initially and always prior to DRE. The only question in the ordering was whether, for a particular person, X-ray was scheduled prior to DRE within a round or vice versa. However, this information was not available. Our primary scenario specifies that X-ray was scheduled prior to DRE for all subjects within a round, so the ordering was round 0: PSA, X-ray, DRE, FSG; round 1: PSA, X-ray, DRE; round 2: PSA, X-ray, DRE and round 3: PSA, X-ray, DRE, FSG (Table 1). We also did a sensitivity analysis in which X-ray came before DRE. Results were nearly identical (data not shown).

Table 1:

Prostate, lung, colorectal and ovarian data for false-positive (FP) rate in men

Ordered number	Test	Number at risk (took test with no FP on a previous ordered test)	Number of first FP for the test	Hazard for first FP on test
1	PSA	32,533	1768	0.0543
	X-ray	32,243	2966	0.0920
	DRE	27,918	1735	0.0621
	FSG	24,765	5890	0.2378
2	PSA	19,790	416	0.0210
	X-ray	20,457	996	0.0487
	DRE	18,371	699	0.0380
3	PSA	16,622	351	0.0211
	X-ray	17,095	695	0.0407
	DRE	15,530	545	0.0380
4	PSA	12,661	253	0.0200
	X-ray	10,051	389	0.0387
	DRE	11,864	325	0.0274
	FSG	3,013	404	0.1341

The total risk over 14 tests equals 1 - the product of 1 - each hazard.

FPs had invasive work-up in the absence of cancer (adenoma) within three years. There was a protocol change on round 3 leading to fewer FSGs.

PSA, prostate-specific antigen; DRE, digital rectal exam; FSG, flexible sigmoidoscopy.

Most subjects were missing various tests, with some missingness because of changes in protocol. During the trial, the second FSG screening was changed from round 3 to round 5 owing to changes in common practice and guidelines. This meant that fewer subjects received FSG on round 3 than originally anticipated.

It is convenient to describe the analysis method using hypothetical data in Table 2. Table 2a depicts the hypothetical data of four people who enter the screening programme. The first step in the analysis was to treat any test result after a first FP as censored (Table 2b). The second step, at least conceptually, was to list the tests taken in the order of appearance until the first FP (Table 2c). The last test in the ordering was either the first FP (analogous to a failure time) or the last test taken with no FPs having occurred on that test or previously (analogous to a censoring time). The key to our method is to treat the test type and the number of previous tests of that type that were taken as a ‘time-varying’ covariate. We then made two assumptions:

Assumption 1

The probability of an FP on a given test conditional on no FPs in any of the previous tests (regardless of type) depends only on the type of test (PSA, DRE, X-ray or FSG) and the number of previous tests of that type that were taken.

Assumption 2

The probability a test is censored can depend on the number and the type of previous tests taken, but not on any information about the censored tests.

Under Assumption 1, the quantity of interest is the hazard $TH _rt , which is the probability that a subject has a first FP (among all tests regardless of the type) on the r-th test of type t.

The third step in our analytical method was estimation of the hazard. As derived in Appendix A, under Assumptions 1 and 2 (the latter analogous to non-informative censoring), the maximum likelihood estimate of the hazard is

{\hat{θ}}_{rt} = \frac{d_{r t}}{N r t},

where N_rt = number of subjects who took the r-th type t test and were not an FP on any previous test,

d_rt = number of subjects who were a first FP on the r-th type t test.

Estimation of the hazard in Equation (1) is easily accomplished by taking the modified raw data in Table 2b and ‘shifting’ the data for each type of test to the left, according to Table 2c, which fills in some of the missing entries. This gives Table 2d. In Table 2d the estimated hazard for each column, corresponding to a particular type of test and its ordered occurrence, equals the number of FP's in the column divided by the number of non-missing values in the column.

The last step was to compute the estimated risk of at least one FP, called the cumulative risk of an FP, for the scheduled regimen of screenings. The estimated cumulative risk of an FP equals one minus the estimated risk of no FPs,

\overset{i^}{R} s k = 1 - \prod_{{r, t}} (1 - {\hat{θ}}_{r t}),

where {r,t} = {(1,PSA), (1, X-ray), (1, DRE), (1, FSG), (2, PSA), (2, X-ray), (2, DRE), (3, PSA), (3, X-ray), (3, DRE), (4, PSA), (4, X-ray), (4, DRE), (4, FSG)}. As derived in Appendix B, the estimated asymptotic variance of Risk is

v \hat{a} r (\overset{i^}{R} s k) = {(1 - \overset{i^}{R} s k)}^{2} [\sum_{{r, t}} \frac{{\hat{θ}}_{r t}}{(1 - {\hat{θ}}_{r t}) N_{r t}}] .

The asymptotic 95% CI is Rîsk + 1.96 √ Vâr(Rîsk).

Table 2:

Hypothetical example of data analysis

	Round 0				Round 1			Round 2			Round 2
	P	X	D	F	P	X	D	P	X	D	P	X	D	F
(a) Hypothetical raw data
Person 1	0		0	0	0	0
Person 2		0	0		0	0	0	1	0	1		0	0	1
Person 3	0	0	0	0				1	0	0	0	0	1	0
Person 4		0	0	0		0	0	0	1	0	1	0	0	0
(b) Data ignoring censoring after first FP
Person 1	0		0	0	0	0
Person 2		0	0		0	0	0	1
Person 3	0	0	0	0				1
Person 4		0	0	0		0	0	0	1
(c) Survival formulation
	Order of appearance of all tests														Last observation
	1	2	4	5	8	10	11	12	13	14
Person 1	P1	D1	F1	P2	X1										Censored
Person 2	X1	D1	P1	X2	D2	P2									Failed
Person 3	P1	X1	D1	F1	P2										Failed
Person 4	X1	D1	F1	X2	D2	P2	X1								Failed
(d) Estimation algorithm based on (b) and (c)
	Occurrence of particular type of test
	First				Second			Third			Fourth
	P	X	D	F	P	X	D	P	X	D	P	X	D	F
Person 1	0	0	0	0	0
Person 2	0	0	0		1	0	0
Person 3	0	0	0	0	1
Person 4	1	0	0	0	0	0
No. first FP	1	0	0	0		2	0	0
No. at risk for first FP	4	4	4	3	3	2	2
Estimated hazard	1/4	0	0	0	2/3	0	0

P, prostate-specific antigen; X, X-ray; D, digital rectal exam; F, flexible sigmoidoscopy; 0, no FP; 1, FP; blank is missing Number after P, D, X, F is ordered number of that test

Results

The estimated hazards for the primary scenario are presented in Table 1. Within most rounds, the number taking in each test was fairly similar except for FSG on round 3, owing to changes in study protocol. For this primary scenario, the estimated probability of an FP in 14 a tests was 60.5% with 95% CI of (59.3%, 61.6%). As sensitivity analysis, we investigated a secondary scenario that reversed the order of DRE and X-ray, and obtained the same estimates for three significant figures.

Discussion

This paper involved various methodological considerations and innovations. To accommodate missing data, we used an ordered list of tests taken for each person, which is a common technique in the methods for estimating cumulative risk of a single type of test^1–4 and applied here to multiple types of tests. By making missing occur at the end as a censoring, we circumvented the added complexity when modelling the probability a test is missing within a set of tests. An implicit assumption of this ordering approach is that the time between tests is not predictive of whether or not a later test was censored or was an FP.

Importantly, we allowed censoring to depend on the previous history of tests. In contrast, some methods for estimating the cumulative risk of FPs unrealistically assume a constant probablity of censoring.^1,2 Nevertheless, bias could arise if a subset of subjects, such as older subjects, were both more likely to be censored and have an FP after censoring. This bias can be avoided by stratifying the analysis by this subset.⁷

A key innovation for combining data from different types of tests is the introduction of a covariate that is analogous to a time-varying covariate in survival analysis. From this perspective, the formulation is an extension of the Kaplan-Meier approach to a time-varying covariate. In this study, our time-varying covariate was the type of test and number of previous tests of that type that were taken.

In a companion paper⁶, the time-varying covariate was the type of test and round. Round is likely less relevant than the number of previous tests because the first test, regardless of round, may have a different FP rate than later tests as there is no background information to modify clinical interpretation. Our estimates of cumulative risk of FPs among men in the PLCO study were similar to those in the companion paper. The likely reason is that there was little data missing in the first round, so that the first round corresponded to the first test in most instances.

Although the focus of the paper is on methodology, some discussion is warranted about clinical definitions and assumptions. For this analysis, advanced adenomas (including those with villous histology, severe cellular dysplasia or >1 cm diameter) were considered true-positives. However, it is possible that the identification of smaller lesions (which are classified as FPs), which would then lead to their removal, could prevent cancer. Such FPs could in theory confer eventual benefit. But because the natural history is unknown, it could also as well be the case that these small lesions would never progress to cancer and identifying and removing them could confer net harm owing to bleeding, infection or colon perforation. Here, we take the view that identification of the smaller lesions is detrimental and included as an FP.

We defined FPs based on no cancer in a three-year interval after screening. The longer the time interval, the more likely one will correctly detect a cancer missed on screening, but also the more likely one will incorrectly count a cancer arising after screening as an FP. This trade-off depends on the natural history of the disease.

Our results have important implications. Recommendations for cancer screening need to be based on both the expected cancer mortality reduction, if any, owing to screening and the harm from FPs. Separate estimates of FP rates for particular tests do not capture the harm from an entire screening programme. For example, a high cumulative FP rate in a multiphasic screening programme may cause screening participants to avoid further screening of other types of tests in the same programme, even though there might still be a net benefit of screening.

Estimates of the burden of FP rates of a specific test are often made in isolation, test-by-test, sometimes by organizations or specialties with a particular interest in a specific disease. This can lead to a plethora of screening recommendations, each associated with separate estimates of FP rates. The method used in this paper can provide a broader estimate of the impact of such recommendations.

Footnotes

Appendix A

Appendix B

References

Shapiro

, Venet

, Strax

. Ten- to fourteen-year effect of screening on breast cancer mortality. J Natl Cancer Inst 1982; 69: 349-55

Tabar

, Fagerberg

, Gad

. Reduction in mortality from breast cancer after mass screening with mammography Randomised trial from the breast cancer screening working group of the Swedish National Board of Health and Welfare. Lancet 1985; 1: 829-32

Dilhuydy

, Barreau

. The debate over mass mammography: is it beneficial for women? Eur J Radiol 1997; 24: 86-93

Gøtzsche

, Olsen

. Is screening for breast cancer with mammography justifiable? Lancet 2000; 355: 129-34

Warren

. Screening women at high risk of breast cancer on the basis of evidence. Eur J Radiol 2001; 39: 50-9

Parvinen

, Helenius

, Pylkkänen

. Service screening mammography reduces breast cancer mortality among elderly women in Turku. J Med Screen 2006; 13: 34-40

Jonsson

, Bordás

, Wallin

. Service screening with mammography in northern Sweden: effects on breast cancer mortality-an update. J Med Screen 2007; 14: 87-93

Bodo

, Dobrossy

, Liszka

. Cancer screening in Hungary: World Bank supported model programs. [In Hungarian: Rákszúrés Magyarországon: Modellprogramok világbanki támogatással]. Orv Hetil 1997; 138: 1801-4

Boncz

. Organized nationwide breast cancer screening programme was introduced in Hungary in 2002. Swiss Med Wkly 2006; 136: 328

10.

Frede

. Opportunistic breast cancer early detection in Tyrol, Austria 1996-2004. Isa mammography-screening program necessary? Eur J Radiol 2005; 55: 130-8

11.

Feldstein

, Vogt

, Aickin

. Mammography screening rates decline: a person-time approach to evaluation. Prev Med 2006; 43: 178-82

12.

European Commission. European Guidelines for Quality Assurance in Breast Screening and Diagnosis. 4th edn. Luxembourg: Office for Official Publications of the European Communities, 2006

13.

Boncz

, Sebestyén

, Döbrössy

. The organization and results of first screening round of the Hungarian nationwide organised breast cancer screening programme. Ann Oncol 2007; 18: 795-9

14.

Jensen

, Olsen

, von Euler-Chelpin

. Do nonattenders in mammography screening programmes seek mammography elsewhere? Int J Cancer 2005; 113: 464-70

15.

Beemsterboer

, de Koning

, Looman

. Mammography requests in general practice during the introduction of nationwide breast cancer screening, 1988-1995. Eur J Cancer 1999; 35: 450-4

16.

Bulliard

, De Landtsheer

, Levi

. Results from the Swiss mammography screening pilot programme. Eur J Cancer 2003; 39: 1761-9

17.

Rohlfs

, Borrell

, Plasencia

. Social inequalities and realisation of opportunistic screening mammographics in Barcelona (Spain). J Epidemiol Community Health 1998; 52: 205-6

Estimating the cumulative risk of a false-positive under a regimen involving various types of cancer screening tests

Abstract

Introduction

Methods

Results

Conclusion

Background

Methods

Assumption 1

Assumption 2

Results

Discussion

Footnotes

Appendix A

Appendix B

References