Sage Journals: Discover world-class research

Abstract

Xu and Prorok¹ point out that our test for independence² between the outcomes of subsequent screens is not equivalent to testing independence of Xi and X₂. It is correct that we tested the hypothesis of independence between subsequent screens by testing an implication of this independence. When it is not possible or straightforward directly to test a hypothesis, it is a common procedure in statistics to test an implication of the hypothesis instead. We compared the cumulative false positive risk expected under the assumption of independency with the observed cumulative false positive risk and tested whether these two probabilities were equal. The acceptance by a statistical test of an implication of the hypothesis is of course not the same as the acceptance of the hypothesis itself, but it is a way to strengthen/weaken one's belief in the hypothesis. If we did not have independence, it seems very odd that the expected cumulative false positive risks in two independent mammography screening programmes resemble so remarkably the observed cumulative false positive risks. Our belief in the hypothesis of independence was further strengthened by the fact that it seems reasonable in view of the radiologists’ practice of comparing new with old mammograms.

Xu and Prorok provide an example with very few observations to illustrate that our method does not prove independence. The fact that they have so few observations gives a very broad 95% confidence interval, so the hypothesis of p = p^* will be accepted for a very large span of p-values, including p-values from data without independence between subsequent screens. We do not question the theoretical correctness of this example. But the example is far away from the reality when evaluating mammography screening programmes. This evaluation normally includes observations of subsequent screens from thousands of women, and thus yields narrow confidence intervals and strengthens the test of p = p^*.

For the HIP Mammography Screening Programme, Xu et al¹ calculated the cumulative false positive risk to be 24.5%. Using the same data and our method, we calculated the cumulative false positive risk to be 23.6%. Perhaps the radiologists’ in the HIP Programme did not consequently compare new mammograms with old mammograms whereby independence between outcomes from subsequent screens will be lost. This could explain the small difference in our estimates, but under all circumstances the small difference is probably insignificant to the targeted women to whom this estimate is provided. We therefore find that for the real life large data sets with narrow 95% confidence intervals, our method provides a valid, pragmatic test of independence.

References

, Prorok

. On testing independence of repeated screening tests. J Med Screen 2009; 16: 50

Njor

, Olsen

, Schwartz

. Predicting the risk of a false-positive test for women following a mammography screening programme. J Med Screen 2007; 14: 94-97 DOI: 10.1258/jms.2009.009010

Response to Xu and Prorok

Abstract

References