Abstract
Despite overwhelming evidence of a major reduction in deaths, the debate about the efficacy of breast cancer screening has continued for over 50 years. The poor results in the Canadian National Breast Screening Studies (CNBSS) have been used to challenge the benefits shown by the other randomized, controlled trials. They continue to be used in assessing the value of breast cancer screening despite their unblinded allocation process, which first identified women with breast abnormalities and then assigned them on open lists allowing for nonrandom assignment, compromising the trials and rendering their results unreliable. There were, statistically significantly, more women with advanced cancers who were assigned to the screening arm in CNBSS1. The early results for CNBSS1 showed an excess of women dying in the screening arm, and an (otherwise inexplicable) greater than 90%, 5-year survival for the control women. The failure of random assignment also explains why the clinically evident cancers were larger in the screening arms than the cancers in the “usual care” arms, despite the fact that the screened women underwent very intense clinical breast examinations each year by highly skilled examiners. The claim that balanced demographic factors prove random assignment is also false. Nonrandom allocation of a hundred or more women with clinically evident abnormalities would have no detectable influence on the distribution of demographic factors. In summary, policy decisions about mammography should not be influenced by the results of the CNBSS.
Keywords
The randomized trials of mammographic screening for breast cancer have been subject to repeated review over the last 20 years.1,2 The primary results of the trials vary from a substantial reduction in breast cancer mortality, with the offer of screening, to no reduction at all, the Canadian National Breast Screening Studies (CNBSS) being examples of the latter. Despite a number of well-documented issues with the potential to invalidate the results of the CNBSS, 3 these studies continue to be cited as reliable evidence against screening, and are often given equal value when considered alongside other trials of mammography in screening guidelines. 4
Major concerns have been raised about the CNBSS over the decades since their inception,5,6 but awareness of these still appears to be limited. 4 The purpose of this article is to collate the major concerns in a single commentary.
Some history
In the late 1970s and early 1980s, the results of the Health Insurance Plan (HIP) of New York randomized controlled trial (RCT), showing that early detection saved lives, 7 led to additional RCTs of screening. Since clinical breast examination (CBE) was credited, rightly or wrongly, with much of the earlier detection in the HIP study, questions arose as to whether CBE alone might be sufficient for reducing breast cancer deaths. In their 1992 paper, 8 Miller et al. state that one of the objectives of CNBSS2 was “to evaluate the efficacy of annual mammography over and above annual physical examination of the breasts and the teaching of breast self-examination among women aged 50 to 59 years on entry.”
Originally the CNBSS was called the National Breast Screening Study of Canada (NBSS). It actually consisted of two separate studies with different protocols. These were, subsequently, and sometimes confusingly, evaluated together as if they were a single trial. 9 Doing so has obscured some of the fundamental issues and problems with the individual trials.
CNBSS1 recruited just under 50,000 women aged 40 to 49. The protocol provided everyone with a CBE at the outset and then women allocated to the screening arm received a mammogram and CBE every year for a total of five years. The control women, after the initial CBE, had nothing more than the “usual care” in the community.
CNBSS2 recruited almost 40,000 women aged 50–59. Women assigned to the screening arm were provided a CBE and mammogram every year for the five years of the trial while the control women all received a CBE every year.
The major concerns fall into, essentially, two categories:
issues with design, potential randomization subversion and consequent bias with respect to comparison of major endpoints between intervention and control groups; and issues of quality of the intervention itself.
Design and randomization issues
Every participant in both trials underwent a CBE prior to being allocated to the screening or control groups. Women with lumps and other signs and symptoms were identified prior to randomization. This violates a fundamental principle of RCTs, because the results of the CBE were known to those who assigned women to the screening or control group, which is likely to have introduced bias (through lack of “blinding”). Furthermore, in a trial of screening, it would be expected that women with clinically evident cancers at baseline would be excluded. However, these women were allowed to participate in the hope of increasing the statistical power by including more women with cancer (personal communication from Anthony Miller, MD), despite the fact that they could not benefit from screening. This might have simply diluted the results, but by knowing the results of the clinical examination, the coordinators also had the opportunity to assign them nonrandomly to the mammography arms.
There should have been no opportunity to compromise the random allocations, but women were assigned on open lists. Instead, the unblinded study coordinators could, potentially, assign any woman to whichever group he/she wanted.
Evidence that allocation was compromised can be seen in CNBSS1, where there was a statistically significant excess of women with four or more positive axillary lymph nodes who were assigned to the screening arm. There were 19 such women in the mammography screening arm versus only 5 in the control arm.10,11 The explanation has been made that “mammography finds more of everything,” ignoring the fact that 17 out of the 19 were clinically evident cancers.
Although there are other issues with CNBSS2 (see section on mammography quality below), the significant imbalance in advanced cancers at baseline was only seen in CNBSS1.
In CNBSS1, there were more breast cancer deaths in the screening arm than the control arm for the first seven years of the trial. 12 The most likely explanation, as noted above, is that women with advanced cancers were assigned disproportionately to this arm. Supporting this is the fact that the five-year survival for the control women in CNBSS1 was greater than 90% at a time when the five-year survival in Canada was only 75%. Certainly, more health-conscious women are likely to volunteer for trials, but it is also likely, given the above and the striking difference in survival between the trial controls and the general tumor population, that women with advanced cancers who would have been randomly assigned to the control arm were instead (nonrandomly) assigned to the screening arm by study staff. This would produce the striking survival results in the control arm and the relative excess of deaths among the screened women. There is no other biologically sound explanation.
The investigators claim that, because there was no evidence of an imbalance in the demographics—that is, in the two trial populations as a whole, the distribution of breast cancer risk factors was similar between intervention and control groups8,12 – this proves that allocation was random. This was recently repeated in a paper by the Principal Investigator 13 who stated that “the participants were randomly assigned after completing a questionnaire on risk factors for breast cancer.” However, allocating the very small minority of women (a few dozen, or even a few hundred) with “lumps” or other signs of possible breast cancer to the screening arms would make no discernible difference to the summarized demographic characteristics in the entire randomized population.
The imbalanced allocation would also bias secondary endpoints such as breast cancer incidence, with resulting overinterpretation as in the investigators’ 25-year report. 9 The trialists’ claim that there were 22% more cancers detected in the screening arm representing considerable “overdiagnosis,” is belied by their own data. Their Table 1 shows that there were 3250 cancers diagnosed among the intervention group women and 3133 among the controls. This is a difference of only 117 women; a difference of less than 4% which can readily be explained by imbalanced assignment.
In a more recent analysis, 14 the CNBSS provided the numbers of cancers detected both during screening and after multiple periods following screening. For women aged 40–49 (Table 1A), there were 59 more women at the end of the screening phase of the trial diagnosed with breast cancer in the screening arm compared to the controls. However, by 20 years, this had risen to 103. This means that more than 40% of the excess cancers were found after screening stopped. Unless the women in the screening group continued to be screened and the control women did not participate, this is further evidence that breast cancer incidence was not the same in both groups, again suggesting that they were not randomly allocated. Whatever the explanation, the excess of 103 cancers cannot be a reliable estimate of overdiagnosis.
There is further concern about an allocation imbalance that is obscured by the fact that the data from the two trials were combined in the 25-year follow-up. 9 In Table 2 of that follow-up it is reported that the average size of the clinically detected cancers was the same in both arms of the combined trials, at 2.1 cm. In CNBSS2 all the participants (both arms) had a rigorous CBE each year so it is not surprising that there would be no difference in the size of the clinically detected cancers in CNBSS2. However, in CNBSS1, only the women assigned to the screening arm had a CBE each year by highly trained nurse examiners, while the women in the control arm of CNBSS1, after the prevalence screen, had “usual care” in the community. It is hard to imagine that a rigorous annual CBE would detect cancers at the same size as would be likely were women to have their cancers detected in a community setting. Combining the data from the two trials together and not reporting them separately, may well have obscured the fact that the cancers detected by the rigorous CBE in CNBSS1 were likely larger than the cancers detected by “usual care.” If allocation had been random, then the size of the clinically detected cancers in CNBSS1 found by highly trained examiners should have been smaller than in the control arm.
The data in Table 2 also shows that only 32% of the cancers were detected by mammography alone in the screening arms. Not only is this further evidence of the poor quality of the mammography, but we can crudely estimate that if the average size of all cancers detected in the screening arm (Mammography + CBE) was 1.9 cm and by mammography alone was 1.4 cm, and the ratio of detection was mammography 32% and CBE 68%, then the average for the CBE would be expected to have been 2.3 cm, which is larger than the reported 2.1 cm. Without knowing the actual sizes and numbers detected in CNBSS1 we cannot be certain, but the numbers suggest that there were cancers in the screening arm of CNBSS1 that were larger than those in the “usual care” arm, suggesting an allocation imbalance.
In response to concerns about nonrandom allocation of the crucial subgroup of women with palpable, advanced cancer at recruitment, the argument has been made that the CNBSS underwent external review. 15 However, no outside reviewer interviewed the study coordinators or staff, to determine whether there had been well-meaning interference in the assignment to control or study group, so the integrity of the allocation process can never be assured. The reviewers, Drs Bailar and MacMahon, did confirm that the allocation was unblinded following a CBE. They did not find evidence of allocation subversion, but they reviewed the number of erasures on the assignment sheets in only 3 of the 15 centers. Furthermore, since allocation was on open lists, there would have been no need to erase an entry. Thus, the review was at best incomplete, but it did find serious violations of randomized trial principles.
Quality of the intervention
Not only was random allocation in the trials corrupted, but the quality of the mammography was poor to unacceptable for much of the trial. If a treatment trial used an outdated therapeutic agent, it would never reach publication. It is challenging to give these screening trials any credence when it is clear, in the words of the CNBSS’s own reference physicist, that he “identified many concerns regarding the quality of mammography carried out in some of the NBSS screening centers. That quality [in the NBSS] was far below state of the art, even for that time (early 1980s).” 16 Two successive outside consultants to the CNBSS, Wendie Logan, MD, followed by Stephen Feig, MD, resigned over the failure of the Principle Investigators to improve the mammograms. 17 Laszlo Tabar, MD, brought in by Dr Feig, confirmed the poor quality of the mammography (personal communication, Laszlo Tabar, MD). It was not until Edward Sickles, MD, brought in as a third external consultant, threatened to resign, that there was internal agreement to an objective review. Myron Moskowitz, MD, Douglas Sanders, MD, and I were invited to review cases organized by the CNBSS, and we found that, for much of the trial, the images were “poor to unacceptable.”18,19
Quality issues include:
There had been no training for the technologists. There had been minimal if any training for the radiologists.
20
This, combined with the poor technical quality of the images, contributed to the poor outcomes in the study arms. It should be noted that none of the radiologists have ever defended the trials. The straight mediolateral projection was used for five out of the eight years, which misses cancers detectable on the mediolateral oblique view, which was not used.
17
Old, outdated (even secondhand) mammography equipment was used (in Vancouver the device was 11 years old).
17
There was no phototiming (automatic exposure control) in 5 of the 15 centers
17
so that images could be under or overexposed. Grids to clean up scatter which can hide cancers were not permitted for much of the trial.
21
Recommendations for biopsy were not always heeded.
Conclusion
Given the above, it is inappropriate to accept the findings of the CNBSS as an unbiased comparison of mammography with an unscreened group, and reviews of all breast screening trials should always acknowledge the issues and, where relevant, examine the effects on cancers detected, mortality and other endpoints after excluding the CNBSS. It is also worth noting that all other mammography trials, with the exception of the HIP trial, did not incorporate routine CBE within their protocols, making this another reason to treat the CNBSS separately.
The problem is that the study design and execution made undetectable, nonrandom assignment possible. None of the coordinators had experience with RCTs. In view of the imbalance noted above, it is likely that some naively assigned women with clinical signs or symptoms to the screening arms to ensure that these women got mammograms. The independent reviewers of the CNBSS should have interviewed the coordinators.
A treatment trial that used an outdated therapeutic agent, identified women with large, advanced cancers prior to allocation, and then allocated them on open lists resulting in more advanced cancers in the control arm would have difficulty in passing peer review. Because of the above issues of both allocation and quality of mammography, the best that can be said about the CNBSS is that they failed to find an effect of screening. They provide no reliable evidence against mammographic screening. Guidelines groups, systematic reviewers and peer reviewers appear to have been unaware of the shortcomings of the CNBSS in the past. Reviews and guidelines recommending against screening, and more particularly recommending against screening in specific age groups,4,22 have given substantial weight to the CNBSS. Since there is no way to correct for the fundamental shortcomings in design and quality of the intervention, it is inappropriate (if not incorrect) to use the results from these compromised trials in deciding breast cancer screening guidelines and as a basis for reducing access to screening.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
