Abstract
Objectives
To clarify the relationship between facility-level mammography interpretive volume and breast cancer screening outcomes.
Methods
We calculated annual mammography interpretive volumes from 2000–2009 for 116 facilities participating in the U.S. Breast Cancer Surveillance Consortium (BCSC). Radiology, pathology, cancer registry, and women’s self-report information were used to determine the indication for each exam, cancer characteristics, and patient characteristics. We examined the effect of annual total volume and percentage of mammograms that were screening on cancer detection rates using multinomial logistic regression adjusting for age, race/ethnicity, time since last mammogram, and BCSC registries. “Good prognosis” tumours were defined as screen-detected invasive cancers that were <15 mm, early stage, and lymph node negative at diagnosis.
Results
From 3,098,481 screening mammograms, 9,899 cancers were screen-detected within one year of the exam. Approximately 80% of facilities had annual total interpretive volumes of >2,000 mammograms, and 42% had >5,000. Higher total volume facilities were significantly more likely to diagnose invasive tumours with good prognoses (odds ratio [OR] 1.32; 95% confidence interval [CI] 1.10–1.60, for total volume of 5,000–10,000/year v. 1,000–2,000/year; p-for-trend <0.001). A concomitant decrease in tumours with poor prognosis was seen (OR 0.78; 95%CI 0.63–0.98 for total volume of 5,000–10,000/year v. 1,000–2,000/year).
Conclusions
Mammography facilities with higher total interpretive volumes detected more good prognosis invasive tumours and fewer poor prognosis invasive tumours, suggesting that women attending these facilities may be more likely to benefit from screening.
Introduction
While there is compelling evidence that mammography leads to a decrease in breast cancer mortality, benefit is largely gained through early detection of some invasive tumours.1–4 There is widespread agreement that the quality of mammography has to be high for mammography to maximize mortality benefits. 5 Defining and measuring quality in mammography interpretation are challenging. Mammography quality can be measured using characteristics of the images, or by numerous measures of interpretive performance, such as the proportion of women called back for additional imaging, sensitivity, specificity, and positive predictive value. Another important measure of quality is the rate of detecting early invasive cancer,4,5 as it is only by diagnosing and treating invasive cancers early, and thereby preventing progression to more advanced disease, that mammography can achieve a mortality benefit.1–4
Volume-outcome relationships have been identified across broad areas of medicine, including for cancer directed treatments.6,7 Although there has been considerable work published on variation in breast screening performance by physician volume,8–15 less has been published on facility volume. Mammography facilities vary significantly in volume of mammograms interpreted, 16 but whether this variation relates to quality is unclear. Some evidence suggests that facility-level mammography volume may be associated with better mammography interpretive performance, although this relationship has not been consistently demonstrated.15,17,18 Two studies examining facility-level cancer detection – one in the United Kingdom 18 and one in Canada, 15 demonstrated a positive association between facility mammography volume and overall cancer detection rate, even when accounting for radiologist volume. 15
Most analyses of mammography quality focus on measures of sensitivity (the proportion of cancers for which the mammogram detects the cancer), and specificity (the proportion of women without cancer who have a normal test result). However, it is really the detection of good prognosis tumours (small, early stage and, lymph node negative cancers) that would be expected to influence mortality rates and/or excess morbidity, rather that the detection of all cancers. We here assess the association between facility interpretive volume and the rate of detecting good prognosis invasive tumours. We hypothesized that cancer detection rate of more favourable prognosis tumours may be associated with greater facility mammography volume because readers in those facilities would have more experience and skill.
Methods
Study Population and Data Sources
This study included data from 116 facilities participating in one of seven U.S. Breast Cancer Surveillance Consortium (BCSC) breast imaging registries (San Francisco Mammography Registry, Colorado Mammography Advocacy Project, Carolina Mammography Registry, New Hampshire Mammography Network, Vermont Breast Cancer Surveillance Consortium, Group Health Registry, New Mexico Mammography Project). These registries collect information on mammography performed at participating facilities in their defined catchment areas and link this information to state tumour registries or regional Surveillance Epidemiology and End Results programmes, to obtain population-based cancer data. 19 Demographic and breast cancer risk factor data including age and first-degree family history are collected using a self-reported questionnaire completed at each screening mammogram. Time since last mammogram is both self-reported and derived from observed BCSC registry data. Both the facility at which the mammogram is performed and the facility at which it is interpreted are tracked separately through registry protocols. The BCSC registries and Statistical Coordinating Center received institutional review board approval for active or passive consenting processes and a Federal Certificate of Confidentiality and other protections for participating women, physicians, and facilities. All procedures are Health Insurance Portability and Accountability Act compliant. 19
Mammography Data and Mammographic Volume Definition
Interpretive volumes (total and screening) were calculated monthly for facilities from 2000–2009. Mammography exams indicated by the radiologist to be performed for screening and not for the additional evaluation of a prior mammogram, short interval following up, or the evaluation of a breast symptom, were considered screening exams.20–22 These monthly volume measures were then aggregated to generate time-varying facility measures of annual screening volume, total volume, and percent of total volume that is screening as in prior studies.13,14,21
The unit of analysis was at the mammogram level. At each mammogram, woman-level data were collected including information on age, race and ethnicity, screening history, and breast cancer risk factors. For measuring our outcomes, we included screening mammograms performed from 2001–2009 among women with no prior history of breast cancer, mastectomy, or breast augmentation. We were primarily interested in whether the interpretive volume at a facility in the year prior to each study screening mammogram was associated with detection of good or poor prognosis invasive tumours (defined below) diagnosed within the year following the screening mammogram, after controlling for important woman-level characteristics.
Breast Cancer Cases and Tumour Characteristics
Women diagnosed with invasive breast carcinoma within one year of a positive screening mammogram and before their next screening mammogram were considered to have screen-detected cancer.21,23 Tumour characteristics were collected from tumour registries and pathology databases and included: size (small, <10 mm; moderate, 10 -<15mm; large, ≥15 mm), AJCC 6th edition stage (early, 1, 2a; late, 2b, 3, 4), 24 and invasive nodal status (negative, positive).13,14,21 “Good prognosis” tumours were defined as screen-detected invasive cancers that were: small or moderate size (<15 mm), early stage (stage 1, 2a), and lymph node negative at diagnosis. If a screen-detected tumour was large (≥15 mm), late stage (2b, 3,4), or lymph node positive, we classified it as a poor prognosis tumour.13,14,21
Analysis
We examined the distribution of mammograms, and overall and screen-detected cancer cases according to the interpreting facility volume measures (annual screening, annual total, and % screening) interpreted at the facility in the year prior to the mammogram.13,14 We also characterized women’s age, race/ethnicity, and time since prior mammogram in relation to those measures. The distributions of invasive tumour characteristics (size, stage, and nodal status) were calculated within each volume measure level. We also calculated crude, unadjusted rates of detection of cancers with each tumour characteristic within these volume measure categories. To examine the association between facility interpretive volume and detection of cancers with different tumour characteristics, we modeled outcomes (detection of ‘good prognosis invasive cancers’, and ‘poor prognosis invasive cancers’) with multinomial logistic regression. The models were adjusted for the potentially confounding factors of age, race/ethnicity, BCSC registry, and time since prior mammogram.
We used generalized estimating equations, assuming an independent working correlation, and robust standard errors to account for clustering of exams at the reading facility. Separate models were fit for each of the volume measures. We estimated multinomial odds ratios (OR) and 95% confidence intervals (CI) for each volume level, and calculated a test of trend across the volume categories using a Wald test. We performed multiple imputations via chained equations to account for missing data on invasive tumour characteristics, race/ethnicity, and mammography history prior to fitting our outcome models, and we adjusted statistical inference estimates accordingly using Rubin's rules. 25
Results
Descriptives of the study population.
Totals shown at the facility level are based on measuring the average annual volume (and % that is screening) interpreted at the facility over the entire study period. At the mammogram level, however, these are measures based on volume interpreted at the facility in the year prior to the mammogram. Other tables use this latter measure of volume.
Unadjusted cancer detection rates per 1,000 screening mammograms.
580 of 9,899 invasive cancers detected had unknown tumour prognosis. The cancer detection rate of invasive cancers with unknown tumour prognosis was 0.2 per 1,000 screening mammograms overall, and in subgroups defined by volume this rate ranged from 0.1 to 0.5 per 1,000 screening mammograms. Detail on the distribution of tumour characteristics among all screen-detected cancers is shown in
Adjusted (multinomial) odds ratios for the association between cancer detection and volume of the interpreting facilities in the year prior to the mammogram*.
Separate models were estimated for annual total volume and for % that is screening. Models are based on using generalized estimating equations to fit multinomial logistic regression models, accounting for clustering of exams by interpreting facility, and adjusting for mammography registry, age, time since prior mammography, and race/ethnicity. Multinomial odds ratios (OR) with 95% confidence intervals (CI) are shown relative to a reference volume level for each model. In addition to ORs and 95% CIs, we present the p-value for a trend test assessing whether risk of cancer detection tends to increase with increasing volume or percent that is screening. We performed multiple imputation to account for missing data (race, time since prior mammography, and tumour characteristics) prior to fitting our outcome models, and we adjusted statistical inference estimates accordingly using Rubin's rules.
Discussion
Our study is unique in examining interpretive quality as measured by characteristics of screen-detected cancers in relation to facility volume. Detecting invasive cancers that are small, early stage, and node-negative may yield the most mortality reduction – and morbidity benefit – compared with large, late stage, disseminated cancers, the detection of which may not notably improve mortality or morbidity, although we also note the potential for overdiagnosis. We found significant differences in mammographic outcomes by facility interpretive volume. As total volume increased, the rate of detecting “good prognosis” invasive cancers tended to increase. Even after adjusting for potential confounders and possible correlation between exams read at the same facility, the likelihood of detecting “good prognosis” invasive cancers increased significantly with increasing volume.
This study adds important evidence to the literature on mammography performance because few studies have focused on the relationship of facility interpretive volume and interpretive performance, particularly in terms of cancer detection rates and tumour characteristics. High facility volume has been hypothesized to be associated with improved outcomes, such as early detection of invasive cancers, however, the scant literature has been mixed and has measured various outcomes with differing volume measures.15–18,23,24 Also using BCSC data, Taplin et al. studied annual facility volume and screening mammography interpretive accuracy and found no association with sensitivity, 17 and Jackson et al. found no relationship with sensitivity or the area under the receiver operating curve of diagnostic mammography with diagnostic mammography facility volume. 26 In contrast, two international studies found positive associations of volume with screening mammography interpretive performance, two with cancer detection15,18 and one with positive predictive value. 18 However, comparability is uncertain, given differing clinical practices outside of the U.S. The significant association we found with volume and detection of invasive cancers with “good” characteristics supports the volume-outcome relationship for mammography at the facility level.
The Mammography Quality Standards Act addresses radiologist, not facility, volume, with a minimum interpretive volume of 960 mammograms every two years. This is 5–10-fold less than for other countries such as the UK, Canada, and Australia. 27 Facility volume is not currently assessed for quality assurance. Our results suggest better early invasive cancer detection above 2,000 mammograms interpreted annually at a facility, which is consistent with our prior findings suggesting improved screening performance with 2,000 exams interpreted annually for radiologists.14,21 Measuring facility interpretive volume may be more practical to carry out, because many radiologists practice at more than one facility, making accurate tracking across multiple facilities challenging. Further, there are many small facilities, which may have more difficulty recruiting highly skilled breast imagers, may not have the most up-to-date equipment or technologists, may not have the resources to allow extensive follow up of abnormal cases as part of ongoing quality control, 17 may not have the resources to permit extensive continuing medical education, and may not provide sufficient volume for radiologists to maintain a high level of clinical skill. Facility level quality is also important, as many patients have some ability to choose which facility they attend. However, our findings are based on the facility at which mammograms were interpreted, which may not be apparent to patients if images taken at a facility are sent to another facility for interpretation.
This study had some limitations, which we addressed to the extent possible, but are noted here. First, 5.9% of the screen-detected invasive cancer cases from the >3 million mammograms followed were missing tumour information, precluding their classification as good or poor prognosis, and thus were imputed during the multiple imputation process. We are not able to study the relative contribution of radiologist-level v facility-level volume influences, given that many radiologists work at multiple facilities, including facilities outside the BCSC. We also cannot be certain that women seen at each facility were at similar risk of breast cancer, although we explored overall cancer rates in Table 1 and adjusted for likely confounders. We chose not to include ductal carcinoma in situ (DCIS) cases because our conceptual framework was based on tumour prognosis in relation to volume as a marker for interpretive quality. In that framework, we hypothesized that higher volume would be associated with detection of more “good prognosis” tumours, which could conceivably yield a mortality benefit from early detection. For early invasive cancers, this argument is easier to make, but for DCIS, the desirability of more detection is in question and much recent debate has arisen regarding detection of DCIS as a “benefit” or a “harm” of breast cancer screening. This is therefore outside of the conceptual framework of this study. 28
Conclusion
This study shows a positive association of facility-level interpretive volume with detection of invasive tumours having “good” characteristics, and possibly a concomitant decrease in detection of poor prognosis tumours. Based on these findings, we speculate that facility-level mammography quality monitoring could be useful, and should focus, in part, on tumour characteristics. In addition, volume requirements for facilities may be considered, with at least an average of 2,000 mammograms annually recommended. This may be achievable now that most facilities use digital mammography, enabling small facilities to send their mammograms to be interpreted by larger facilities. Also, studies to isolate the mechanism by which volume affects quality may guide interventions to achieve similar performance gains among smaller volume facilities.
Footnotes
Acknowledgements
This work was supported by the American Cancer Society, made possible by a generous donation from the Longaberger Company's Horizon of Hope®Campaign (SIRSG-07-271, SIRSG-07-272, SIRSG-07-273, SIRSG-07-274, SIRSG-07-275, SIRGS-06-281, SIRSG-09-270, SIRSG-09-271], the Breast Cancer Stamp Fund, and the National Cancer Institute Breast Cancer Surveillance Consortium (HHSN261201100031C]. This study was also supported by the National Cancer Institute R21 CA131698 and K24 CA125036. The collection of cancer and vital status data used in this study was supported in part by several state public health departments and cancer registries throughout the U.S. For a full description of these sources, please see: http://www.breastscreening.cancer.gov/work/acknowledgement.html. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health. Their work was invaluable to the success of this project. We also thank the participating women, mammography facilities and radiologists for the data they have provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at:
.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
