Abstract
Objectives
To characterize and quantify the differences in the number of cases and breast cancer deaths in the Swedish W-E Trial compared with the Swedish Overview Committee (OVC) summaries and to study methodological issues related to trials in secondary prevention.
Setting
The study population of the W-E Trial of mammography screening was included in the first (W and E county) and the second (E-county) OVC summary of all Swedish randomized mammography screening trials. The OVC and the W-E Trial used different criteria for case definition and causes of death determination.
Method
A Review Committee compared the original data files from Wand E county and the first and second OVC. The reason for a discrepancy was determined individually for all non-concordant cases or breast cancer deaths.
Results
Of the 2615 cases included by the W-E Trial or the OVC, there were 478 (18%) disagreements. Of the disagreements 82% were due to inclusion/exclusion criteria, and 18% to disagreement with respect to cause of death or vital status at ascertainment. For E-County, the OVC inclusion rules and register based determination of cause of death (second OVC) rather than individual case review (W-E Trial and 1st OVC) resulted in a reduction of the estimate of the effect of screening, but for W-County the difference between the original trial and the OVC was modest.
Conclusions
The conclusion that invitation to mammography screening reduces breast cancer mortality remains robust. Disagreements were mainly due to study design issues, while disagreements about cause of death were a minority. When secondary research does not adhere to the protocols of the primary research projects, the consequences of such design differences should be investigated and reported. Register linkage of trials can add follow-up information. The precision of trials with modest size is enhanced by individual monitoring of case status and outcome status such as determination of cause of death.
Introduction
In 1987 the Swedish Cancer Society set up an Overview Committee (OVC) to review all the randomized mammography trials in Sweden, the W-E Trial being one of them. The OVC performed two overviews (hereafter called the 1st and the 2nd OVC) by collecting data from all four Swedish mammography trials in a uniform way. However, between the 1st and 2nd OVC there was a difference in the methods of determining cause of death, using an endpoint committee in the 1st OVC and registries only in the 2nd.
Concern was expressed about differences between results reported for the W-E trial by the original trialists and those reported by the Swedish Overview, particularly with respect to numbers of breast cancer deaths. 7 It has been pointed out that such differences are an inevitable consequence of the different case definition and determination of cause of death, and the different eligibility criteria of the Swedish Overview1,8–10 as compared with the W-E Trial. These differences, however, raise both particular and general methodological issues related to follow-up of large trials or sets of trials in secondary prevention. These questions include:
What is the magnitude of these differences at an individual rather than aggregate level, what proportions of the differences are due to inclusion/exclusion criteria of cases in the Swedish Overview and to cause of death determination?
What are the reasons for individual differences between the original study and the overview with respect to breast cancer case definition/inclusion and cause of death?
What are the implications of these kinds of differences for endpoint definition in future studies in primary or secondary prevention?
In this paper, we report on a complete audit of breast cancer cases and deaths in the Swedish W-E Trial, as defined by the original trial investigators and by the Swedish Overview. We report the numbers of disagreements at individual level and the reasons for these. We discuss their implications for interpretation of the overview and the original trial (up to the end of 1993 for W-county and to the end of 1996 for E-county) and for design and follow-up of future secondary prevention trials.
Background
The Swedish W-E Trial was initiated in 1977 in Kopparberg county (now Dalarna, referred to as W-county hereafter) and in 1978 in Östergötland county (E-county). Small geographical clusters were randomized to invitation to screening (Active Study Population, ASP) or no invitation (Passive Study Population, PSP) within 7 strata in W-county and within 12 strata in E-county. The strata were chosen so that clusters within strata were socioeconomically homogeneous. In W-county, randomization was approximately in the ratio 2:1 for ASP:PSP. In E-county, roughly equal numbers were randomized to the two groups. Entry of strata to the trial was staggered to allow the mammography facilities to cope with the workload. Year of birth cohorts were included to give an approximate age range of 40–74. For example, for a stratum whose randomization date was 1977, years of birth 1903–1937 were included. In total, 77,080 women were randomized to the ASP and 55,985 to the PSP. Details of the age and county breakdown of the study population are given elsewhere.11,12
The screening regime was single-view mammography, on average every 24 months in women aged 40–49 and every 33 months in women aged 50–74. At the end of 1984 a significant 31% reduction in breast cancer mortality was observed in the ASP 5 . The PSP was then invited to screening. The trial was closed immediately on completion of the first round of screening in the PSP, and all cases in both arms diagnosed up to and including the end of the first screen of the PSP were followed up for death from breast cancer. In W-county, according to the local trials records, there were 694 breast cancer cases in the ASP and 359 in the PSP. In E-county, there were 732 breast cancer cases diagnosed in the ASP and 683 in the PSP. 6 These cases included both in situ and invasive cancers diagnosed during the trial period.
The OVC defined breast cancer cases as women reported with an invasive breast cancer only (excluding women with cancer in situ) to the Swedish National Cancer Registry (NCR) during each trial's recruitment period using the reporting date in the NCR as the date of diagnosis; women with an invasive breast cancer reported to the NCR before the trial start were excluded from the study base, although women diagnosed before 1958, when the NCR was established, could not be excluded. The OVC also accepted a woman as a breast cancer case in the study when there was only a breast cancer death registered in the Swedish Causes of Death Register (CDR), even if they were not registered at the NCR. Thus, the diagnosis could have occurred before trial start, during the trial period or after the trial ended. Deaths from breast cancer were retrieved from the CDR to include all deaths in women with breast cancer as the underlying cause, according to the death certificate. Inclusion criteria in all analyses in the overview were based on the exact age at randomization, as opposed to the Two-County trial, where inclusion was determined on the basis of year of birth. The OVC retrieved the original randomization file, based on the population register, from the IT co-ordinator responsible for data management in each of the counties. The files were linked by each woman's unique National Registration Number to the corresponding Regional Tumour Registry which provides the data for the NCR to obtain verification and date of diagnosis and to the CDR to obtain date and cause of death. Importantly, the 1st OVC included specialists who independently from the W-E Trial determined the cause of death of the breast cancer patients based on case records. The publications of the 1st OVC gave a relative risk estimate similar to that of the W-E Trial's local committee. 2 In the 2nd OVC the decision was made to use the Swedish National Cancer and Death Registries (NCR and CDR) to determine cause of death instead of using the specialist committee, because the combined relative risk using the register data was similar to that of the 1st OVC. 13
The 1st OVC conducted a computerized follow-up of both the W-County and E-County data to 31 December 1993, and the 2nd OVC continued data collection for the E-County only until 31 December 1996. 2 The computerized follow-up ended 31 December 1993 for the first evaluation round (which was the last time the W-data were included) and 31 December 1996 for the second evaluation round (which was the last time the E-data were included).
The four particularly important differences between the W-E study design and the 1st & 2nd OVC 's criteria were:
The original trial defined inclusion and exclusion of women to the trial by year of birth and residence in the relevant geographical areas at the time of randomization. The 1st & 2nd OVC defined the population by year, month and day of birth.
The end-point committees of the W-E trial and the 1st OVC determined individual patient outcome by reviewing all clinical records as identified in the original trial data and in NCR and CDR data. The 2nd OVC used NCR and CDR data only.
The OVC included women as cases if the CDR reported a breast cancer death even if there was no report of a breast cancer diagnosis in the NCR. The W-E Trial included only those women who had a microscopically confirmed breast cancer diagnosed during the trial period.
The W-E Trial included all breast cancer cases (in situ and invasive), whereas the 1st and 2nd OVC both considered only invasive breast cancer cases reported to the NCR, excluding all women as reported to the NCR as having cancer in situ carcinoma and could not include those by clerical errors not reported from the clinics to the NCR.
Methods
In 2006 the Swedish Cancer Society set up a Joint Review Committee (JRC) including members of the 1st & 2nd OVCs and the project leaders of the W-E trial to investigate the sources of disagreement between the results published by the trialists and the OVC (the 1st OVC for W county and the 2nd OVC for E county data). The lists of women with breast cancer according to the trialists and the OVC were compared. Where necessary, clinical records were retrieved. After investigating each case in the two lists independently by the trialists and the OVC, a classification scheme of the differences was developed by the JRC (Table 1). The records of breast cancer cases and deaths according to the local endpoint committee were compared with those of the OVC using the Swedish National Registration Numbers of the subjects for linkage. The deaths through 1993 were compared for W-county and through 1996 for E-county, as these dates were respectively the most recent Swedish Overview analyses to include each county.8,14 The JRC reviewed each disagreement between the two datasets with respect to either case definition or cause of death. The JRC determined the reasons for each individual disagreement. As a final result, the trialists also accepted some women as additional cancer cases in their trials depending on new information about migrated women and clerical errors. The addition of them to the original datasets is called the JRC conclusive dataset.
Classification of potential differences between the WE trial and the overview
NCR, National cancer register; CDR, National cause of death register
Paired significance tests between OVC and W-E endpoints were carried out using McNemar methods. 15 Associations of the likelihood of disagreement with age, county and trial arm were assessed using the chi-squared test. Relative risks and 95% confidence intervals on these were calculated using Poisson regression.
Results
According to the W-E trial records the total numbers of women for W-county were 38,589 in the ASP and 18,582 in the PSP; for E-county the numbers were 38,481 and 37,403. The corresponding figures in the O-V records were 38,562, 18,478, 38,405 and 37,145. These differences of the order of less than 1% were not influential in the estimation of the primary results.
W-county
Table 2 shows all cases included in either the local trial records for W-county or the 1st OVC records or both, with the endpoint in each data set cross-tabulated. Of the 1053 cases included in the local trial records, the OVC included 925 cases (88%). Conversely, of the 972 cases included in the OVC records, the local trial included 925 breast cancer cases (95%). Of the 443 deaths to 1993 included in both datasets, there were 24 (5%) disagreements regarding determination of cause of death (type G disagreement). Of the total 199 disagreements, whether with respect to case inclusion or to cause of death, 175 (88%) pertained to case inclusion rather than cause of death. For both the ASP and the PSP, the overview was less likely to classify a death as from breast cancer. The magnitude of this tendency did not differ significantly between ASP and PSP.
W-county outcomes tabulated against overview outcomes (agreements in bold)
Incl, included; BCD, breast cancer death; DOC, death from other causes; PSP, passive study population, not invited; ASP, active study population, invited
Table 3 shows the reasons for disagreement between the two breast cancer-case datasets, in the ASP and PSP separately. The largest group of disagreements in the ASP was type D, mainly due to women with screen-detected in situ lesions included in the W-E dataset but not included by the overview. In the PSP, most of the disagreements were of type B, relating to date of diagnosis. These disagreements mostly resulted from women in PSP diagnosed at the first screen but through delays in reporting, not entered into the NCR until after closure of the trial. These women were considered by the OVC only to have been diagnosed at the reporting date to the register and were thus excluded in the OVC (see Table 1, category B). Disagreements with respect to death from breast cancer were mainly due to category G (47%; disagreement about cause of death) and C (29%; use of cause of death register without reference to date of diagnosis) in the ASP, and to G (disagreement about cause of death) and B (32%; definition of date of diagnosis) in the PSP.
Categorized disagreements between W-county trial records and 1st OVC records
PSP, passive study population, not invited; ASP, active study population, invited
Table 4 shows the breast cancer deaths and corresponding relative risks (RR) from the W-arm of the W-E trial, the OVC, and those derived after review of all information by the JRC and the resulting conclusive dataset (i.e. the original trial data plus correction for the clerical errors and cases lost to the trialists due to migration). The OVC result is more conservative than that of the original trial and the result based on the JRC conclusive dataset. All analyses show a significant mortality reduction in the ASP.
Trial mortality result for W-county from original local trial endpoint, 1st OVC endpoint and the JRC conclusive dataset
PSP, passive study population, not invited; ASP, active study population, invited
E-county
Table 5 shows cross-tabulation of the local trial endpoint records for E-county with the 2nd OVC records, for all women with breast cancer in either or both datasets. Of the 1415 women with breast cancer included in the local trial records, the 2nd OVC included 1298 (92%). Of the 1398 cases included in the 2nd OVC records, the local trial included 1298 (93%). Of the 655 deaths to 1996 included in both datasets, there were 53 (8%) disagreements. Of the total 279 disagreements, 217 (78%) pertained to case inclusion, 53 (19%) to cause of death and 9 (3%) to vital status at 31 December 1996. For both the ASP and the PSP, the 2nd OVC was less likely to classify a death as from breast cancer. This tendency was significantly stronger in the PSP (59% vs. 52%; P = 0.03).
E-county outcomes tabulated against 2nd OVC outcomes (agreements in bold)
Incl, included; BCD, breast cancer death; DOC, death of other causes; PSP, passive study population, not invited; ASP, active study population, invited
The reasons for disagreements are shown in Table 6. As with W-county, the largest group, 40% of the disagreements in the ASP are of type D, absence of trial cases from the NCR. For the PSP, however, similar proportions of disagreements were observed in categories D (19%), absence of the case from the NCR, G (22%), disagreement about cause of death, and K (26%), miscellaneous clerical errors and other reasons. With respect to breast cancer death, disagreements were dominated by category G (disagreement about cause of death) and C (date of diagnosis) in both the ASP and PSP.
Categorized disagreements between E-county trial records and 2nd OVC records
PSP, passive study population, not invited; ASP, active study population, invited
Table 7 shows the E-county trial result with respect to breast cancer mortality using the original trial endpoint, the 2nd OVC endpoint and the conclusive endpoint after review of all sources by the JRC (i.e. the original trial data plus correction for the clerical errors and cases lost to the trialists due to migration). The trial endpoint and the JRC conclusive dataset both show a significant 20–23% reduction in mortality, whereas the 2nd OVC result shows a non-significant 10% reduction.
Trial mortality result for E-county from original local trial endpoint, 2nd OVC endpoint and the JRC conclusive dataset
PSP, passive study population, not invited; ASP, active study population, invited
Associations with disagreement
We also investigated whether study group (ASP/PSP), county or age were significantly related to the likelihood of disagreement about breast cancer death. In the 685 cases classified as breast cancer death by either the W and E local committees or the OVC or both, there was no significant association of study group with disagreement (P = 0.2). There was a higher proportion of disagreement in E-county than in W-county, but this did not attain statistical significance (P = 0.09). There was, however, a significant effect of patient-age at the time of randomization on the probability of a risk of disagreement (P < 0.001). In both counties, the disagreement increased with age (Figure 1).
Percentage disagreement between W-E and 2nd OVC by age, in 604 cases classed as breast cancer deaths by one or both sources
Discussion
In this study, the Swedish Cancer Society's Joint Review Committee (JRC) investigated disagreements between the breast cancer incidence and death data as recorded in the original Swedish Two-County Trial, based on individual patient records and determination of cause of death by an expert committee, and that in the 2nd OVC based on the National Cancer Registry and Cause of Death Register. For the purposes of this study, we had full access to original W-E trial data, original data collected for the OVC, individual medical records, and register data from the regional tumour registries for the respective counties. The registration of new diagnosis of breast cancer is mandatory by law in Sweden and the completeness of registration of breast cancer is over 98%. 16 Thus, we were able to determine the reason for discrepancy in every individual case and no discrepancies were left unexplained.
Our main empirical findings are that the JRC found that of the 2615 cases included by the W-E Trial or the OVC, there were 478 (18%) disagreements about inclusion/exclusion of women into the trial or determination of the cause of death. The vast majority of these pertained to a disagreement in inclusion/exclusion and not to disagreement in determination of cause of death. The disagreements were in the great majority of cases due to OVC-study design decisions pertaining to issues such as definition of age and last date of inclusion into the study, and use of a register rather than clinical records for case definition and cause of death determination. Disagreement about whether a death included in both the W-E Trial and the OVC was from breast cancer or not was relatively rare. We also found that the likelihood of disagreement about the cause of death was not significantly affected by county or trial arm. Such disagreement was, however, significantly more likely in older patients. These findings have implications both for the interpretation of screening effects and for methodological issues in overviewing original research.
The combined results of the two counties showed a significant breast cancer mortality reduction associated with the offer of screening by any of the three endpoint criteria. Using the JRC conclusions, the combined RR was 0.69 (95% CI 0.58–0.83). Thus, the overall interpretation was not sensitive to these differences in design. In W-county, the result was significant by any of the three criteria, whereas in E-county, the result was significant using the original trial endpoint, and the JRC conclusive review end-point, but not statistically significant using the 2nd OVC end-point. The JRC conclusive result included some women with breast cancer previously missed by the trialists due to migration, but picked up by the NCR or CDR.
The remit of the JRC was not to determine whether one or the other of the endpoints were correct. However, it is clear from the E-county results that a combination of differing causes of death determinations and inclusion/exclusion rules made a crucial difference to the primary result. It is highly relevant for the field of secondary prevention to understand how such modest disagreements cause such a difference to the outcome in a trial with a total of 133,065 subjects. The answer is that the disagreements only needed to impact on the small minorities of subjects classified as dying from breast cancer within the trial arm subgroups of one geographical stratum (E-county) within the larger trial. In the ASP of E-county, disagreements with respect to cause of death and eligibility for inclusion caused a loss of 16 and a gain of 28 breast cancer deaths, a net increase of 12 breast cancer deaths. In the PSP, there was a loss of 36 breast cancer deaths and a gain of 25, a net loss of 11 deaths (Table 5). Thus the 2nd OVC classification of eligibility for inclusion and cause of death gave a 7% higher death rate in the ASP and a 6% lower death rate in the PSP, sufficient to convert a statistically significant 20% reduction in mortality to a statistically non-significant 10% reduction. It should be noted that if the inclusion criteria had been identical and the only difference had been the disagreements over cause of death, the result in E-county would still have been rendered non-significant.
The effect of misclassification of exposure factors has been extensively studied in epidemiology,17–19 and when it is non-differential with respect to disease outcome, it tends to dilute estimated effects. Although less fully researched, the misclassification of outcome has also been shown to cause underestimation of exposure/outcome associations. 20 Disagreement rates between OVC and W-E classifications were 18% in both counties. Discrepancies of this magnitude are suggestive of misclassification probabilities of 10%, and would be likely to lead to dilution of observed associations by approximately 33%. 21 The differences between W-E and OVC are smaller than this for W-county and rather larger for E-county. That they are proportionally larger for E-county is likely to be due to the fact that disagreement rates were differential between trial arms. The implications of this are that in general, the poorer the classification, the greater the potential for missing a true effect, that the presence of differential misclassification may increase the potential bias, and that the more thorough the classification effort, the more sensitive the comparison is likely to be.
All these circumstances underline the importance of using an expert panel for determining cause of death when the individual study units contain few events. Others have regarded the determinations of such an expert committee as the gold standard, 22 even when they have concluded that national death register information is adequate in comparison. 23 The OVC obtained results closer to those of the original trial when the 1st OVC used an expert endpoint committee. 2
The finding that the disagreement of cause of death increased with age is also of general interest. It accords with the findings of the 1st OVC where four clinicians not involved in the trials independently determined cause of death and the discordance at the initial review was 5%, 5%, 13% and 19% in women 40–49, 50–59, 60–69 and 70–74 years respectively, at randomization. 13 This probably reflects an increasing difficulty to determine cause of death with age for several reasons: a mixed clinical picture due to increasing co-morbidity, death occurring more often at home or in a nursing home without a clinical examination closely before death, very low probability of an autopsy, and increased uncertainty about origin of eventual metastases if also another malignancy has been diagnosed during follow-up. With long-term follow-up, information may also be lost that the woman is a trial participant and that determination of cause of death may be important.
The results of the JRC review show that the disagreements were due to design differences between the clinical intervention trial approach employed in the original W-E Trial and the register-study design used by the OVC. This leads to a more general observation: design decisions in either an original study or a subsequent overview that may at first glance seem trivial – e.g. defining a date for end of trial – can influence basic and important study features such as the number of included subjects. Thus, design differences between original studies and overviews have to be taken into account when the overview does not adhere to the original designs, and it should be investigated if the interpretation is sensitive to such design conflicts. An example here is the inclusion of women with in situ tumours as cases in the original study contrasted with the decision to only include those registered with an invasive cancer in the 1st and the 2nd OVC. This decision made an especially large difference for the ASP. Thus, seemingly general deviances from the original study design may not be neutral to the evaluation of the randomized trial. In this case this decision above all contributed to the different number of cases reported in the original trial as compared with the 1st and 2nd OVC, but little to the evaluation of breast cancer mortality.
Is there a role for registry data in evaluation of primary or secondary interventions? It would definitely seem so where the research involves millions of person-years and large numbers of cause-specific deaths, thus misclassifications are likely to be heavily outnumbered by reliable observations3,4 such as in large prevention and secondary prevention studies. For individual trials with smaller sizes, however, it is more reliable to individually determine case status and the cause of deaths by an expert committee.
Conclusion
The following points are suggested by the above results:
The conclusion that invitation to mammography screening was associated with a significant breast cancer mortality reduction remains robust after a full examination of disagreements between the original Two-County Trial endpoints and those of the Swedish overview. Disagreements about actual cause of death were a minority of the overall disagreements and were common only for older cases; the majority of disagreements related to inclusion or exclusion.
The use of the overview inclusion criteria and the national registry data for determination of breast cancer deaths led to a substantial change in the result for one of the two counties illustrating that non-differential misclassification of the main endpoint tends to drive results towards the null.
Thus, for trials with modest size it would appear to be more prudent to rely on trial logistics with close individual monitoring of case status, presence of covariates and outcome status such as determination of cause of death based on all available clinical information.
When secondary research does not adhere to the protocols of the primary research projects included, the consequences of such design differences should be investigated and reported. Seemingly trivial design decisions may have significant impact on the result and are not always neutral to the randomized design.
Footnotes
Acknowledgements
The study was supported by grants from the Swedish Cancer Society and the American Cancer Society. We thank Sherry Yueh-Hsia Chiu from the Institute of Preventive Medicine, Division of Biostatistics, College of Public Health at the National Taiwan University for excellent help and Robert Smith from the American Cancer Society for valuable discussions and advice.
Lars Holmberg, Stephen Duffy and Jan Frisell oversaw the comparison and coordinated the analyses. Lars Holmberg and Stephen Duffy drafted the report. Jan Frisell and Lars Holmberg were the principal investigators for the grants that supported the study. László Tabár and Bedrich Vitak were the principal investigators for the W and E trial parts, respectively, and provided all data for the W-E trial. Jan Frisell and Lennarth Nystrom were the principal and the coordinating investigators for the Overview committee, respectively, and Lennarth Nystrom provided the Overview data. Stephen Duffy and Amy Yen made the statistical analyses. All authors had full access to the data, contributed in the comparison process, the interpretation of the analyses and revised the manuscript for intellectual content. Lars Holmberg is the guarantor for the study.
