Abstract
Objective
To compare outcome metrics of digital breast tomosynthesis (DBT) breast cancer screening with full-field digital mammogram (FFDM); specifically, to compare recall rates by the type of recalled finding, and to assess if screening with DBT versus FFDM changes biopsy recommendations and if the likelihood of malignancy varied by lesion type, if detected on DBT or FFDM screening mammogram.
Methods
The outcomes of 22,055 FFDM and DBT screening mammograms were retrospectively reviewed. The exams were performed at an academic institution between August 2015 and September 2016. Performance of screening with FFDM versus DBT was compared in terms of recall rate and percentage of recalled lesions resulting in a cancer diagnosis, with subset analyses performed for specific mammographic findings.
Results
The recall rate was 10.6% for FFDM and 8.0% for DBT (
Conclusions
DBT screening confers an advantage in detection of architectural distortion representing malignancy. Recall rate of asymmetries are reduced with screening DBT, probably due to reduction of tissue superimposition. Calcifications pose a particularly difficult diagnostic challenge for breast imagers, regardless of screening mammogram type.
Introduction
Breast cancer is the most common malignancy affecting women worldwide. 1 The American Cancer Society projects that over 276,000 new breast cancer cases and over 42,000 deaths will occur in 2020 in the United States. 2 Multiple studies have demonstrated that breast cancer screening programs reduce breast cancer mortality by 19–49%.3–5 Screening mammography with digital breast tomosynthesis (DBT) is associated with improved cancer detection rates, 6 decreased false-positive rates,7–11 and reduced recalls 12 compared to full-field digital mammogram (FFDM). Additionally, invasive cancers detected on screening DBT may be smaller and more likely to be node negative compared with those detected by digital mammography, especially in younger women,13–15 suggesting cancers detected on DBT portend a better prognosis.
There are four categories of findings, outlined by the Breast Imaging Reporting and Data System (BI-RADS), 16 which may be identified and recalled on mammography—asymmetry, mass, distortion, and calcifications. Limited studies have focused on lesion-level analysis of findings recalled on tomosynthesis imaging and have predominantly centered on the role of synthesized two-dimensional (2D) mammography screening with DBT in comparison with digital mammography combined with DBT.6,17
The goal of our study was to compare outcome metrics of DBT breast cancer screening with FFDM; specifically, to assess whether recall rates vary by the type of recalled finding, if biopsy recommendations differ between screening with DBT versus FFDM, and whether the likelihood of malignancy varied by lesion type if detected on DBT or FFDM screening mammogram. Our overarching aim was to evaluate if patterns exist that may be leveraged to increase true-positive recalls and decrease false-positive recalls.
Materials and methods
This study was approved by the institutional review board (IRB) and was Health Insurance Portability and Accountability Act (HIPAA) compliant. We retrospectively reviewed clinical data from 22,055 FFDM and DBT screening mammograms performed at three outpatient sites at a single academic institution between August 2015 and September 2016. All interpreting radiologists were dedicated breast imagers, with an average of 10.4 years of experience (range 1–22 years). Clinical follow-up was tracked through June 2019.
As previously described, 12 clinical data and imaging findings were recorded using a clinically integrated data collection system, MachForm. Additional clinical data was reviewed in the electronic medical record (Epic, Madison, WI). For each patient recalled, we obtained the date of the screening mammogram, the modality of the screening mammogram (DBT vs. FFDM), breast tissue density (almost entirely fatty, scattered areas of fibroglandular tissue, heterogeneously dense tissue, or extremely dense tissue), the lesion type (calcifications, asymmetry, mass, or architectural distortion) based on the 5th Edition BI-RADS lexicon, number of lesions per patient, follow-up imaging results, and pathology results (if the lesion was biopsied).
Outcomes of the recalled cases were categorized as either false positive or true positive. A false positive was defined as BI-RADS 1 or 2 assessment on diagnostic breast imaging evaluation with two years of cancer-free follow-up, BI-RADS 3 assessment on diagnostic imaging evaluation with two years of cancer-free follow-up, or any benign pathology from biopsy. True-positive cases were defined as any case with malignant pathology on core biopsy or surgical excision within one year. 12 Breast cancer was defined as invasive ductal cancer, invasive lobular cancer, or ductal carcinoma in situ (DCIS). Positive predictive value (PPV1) was defined as the percentage of positive screening examinations (BI-RADS 0) that result in a tissue diagnosis of cancer within one year. 16
All screening mammograms were performed on either a FFDM or DBT mammography unit (Selenia and Dimensions, Hologic, Marlborough, MA). All patients were offered DBT screening, and self-selected if they preferred to have DBT or FFDM screening. Mammograms performed with DBT included a synthesized 2D image generated from the tomosynthesis data. FFDM views were not obtained in combination with DBT screening. Screening mammograms were interpreted on a dedicated workstation (Hologic, SecurView, Marlborough, MA).
Statistical analysis was performed using the computing software program R 2019 (R Foundation for Statistical Computing, Vienna, Austria).
18
The overall performance of screening with FFDM versus DBT was compared in terms of the recall rate and estimated PPV1 using the Chi-squared test. Subset analysis evaluation of the recall rate, final BI-RADS assessments, and pathology results were compared for calcifications, asymmetries, masses, and distortions, as well as for each of the four breast tissue density categories, between the FFDM and DBT groups using the Chi-squared test and Fisher exact test. A
Results
Study dataset
A total of 22,055 screening mammograms were performed at our institution between August 2015 and September 2016. 5029 screening mammograms were performed using FFDM and 17,026 were performed using DBT. 1887 were given a BI-RADS 0 assessment. The overall recall rate was 8.6%; 10.6% for FFDM (531/5029) and 8% (1356/17026) for DBT (
24-month follow-up and/or pathology data were available for 1338 patients with 1597 lesions (226 patients had two lesions identified for recall and 33 had three lesions identified for recall). Thirty-three lesions (32 lymph nodes and 1 dilated duct) were excluded from the analysis due to the small data size, with 1564 lesions included in our final analysis. Results are shown in Tables 1 and 2.
Outcome metrics for FFDM versus DBT screening stratified by finding type. FFDM: full-field digital mammogram; DBT: digital breast tomosynthesis; BI-RADS: Breast Imaging Reporting and Data System.
Fisher’s test performed due to small sample size.
Patients excluded from analysis.
Screening
A total of 1564 lesions were included in the final analysis: 724 asymmetries (46.3%), 443 masses (28.3%), 252 calcifications (16.1%), and 145 architectural distortions (9.3%). The overall distribution of findings recalled from FFDM versus DBT was statistically significant (

Recalled mammographic findings: FFDM versus DBT.
Diagnostic
Final BI-RADS assessments after complete diagnostic evaluations are shown in Table 1. When an asymmetry or architectural distortion was identified on DBT screening mammogram, it was more likely to be assessed as a BI-RADS 4 or 5 and recommended for biopsy on diagnostic workup than when detected on FFDM screening (
Prior to obtaining DBT-guided biopsy capability in 2017, our breast imagers encountered a clinical dilemma when faced with cases of architectural distortion only visible on DBT images, but not on 2D images or ultrasound. Eighteen patients who had a finding described as architectural distortion on screening exam were given a BI-RADS 3 assessment at diagnostic exam. Only 12 of these were described as architectural distortion on the diagnostic exam (the remaining cases were described as an asymmetry or resolved). Five of the twelve cases (42%) were thought to be post-surgical or post biopsy change, and a six-month follow-up was recommended to confirm stability, with no significant change at the follow-up exam. The remaining seven cases (58%) were either described as very subtle or stable compared to multiple prior exams. Due to the lack of tomosynthesis-guided biopsy capability and/or low suspicion for malignancy, six-month follow-up was recommended. Four of these cases were ultimately biopsied due to interval change in appearance (three yielding complex sclerosing lesion or radial scar and one yielding infiltrating lobular carcinoma, noted below).
Pathology
All findings described as BI-RADS 4 or 5 underwent ultrasound or stereotactic biopsy, and pathology results were classified as benign, high risk or malignant. In total, 112 of the 1564 lesions were found to represent breast cancers (7.2%). The likelihood of malignancy was greatest for recalled distortions (25/145, 17.2%), followed by calcifications (28/252, 11.1%), masses (25/443, 5.6%) and lowest for asymmetries (34/724, 4.7%) (

Likelihood of malignancy: FFDM versus DBT.
Stratified results for FFDM versus DBT screening are shown in Table 1. Biopsy of architectural distortion originally identified on DBT screening mammogram more often resulted in malignant pathology (69.4% of biopsy recommendations and 21.0% of recalled cases) when compared to those identified on FFDM screening (0%) (
There were 84 diagnoses of invasive cancers and 28 diagnoses of DCIS. As expected, the distribution of cancer (invasive vs. in situ) was different based on the lesion type biopsied (
Discussion
Our data demonstrate that recall outcomes of tomosynthesis and digital screening mammogram vary by the type of recalled finding. DBT confers its greatest advantage in detection of architectural distortion representing cancer. Architectural distortion is more likely to be recalled on tomosynthesis screening mammogram than on 2D digital mammogram (10.8% vs. 5.6%;
In contrast, asymmetries are less likely to be recalled on DBT compared to FFDM. However, when asymmetries are recalled on DBT screening mammogram, they are more likely to be recommended for biopsy on subsequent diagnostic evaluation than those detected on 2D-digital screening mammogram. There is no difference in the likelihood that a biopsied asymmetry will yield malignancy based upon the initial screening mammogram type. Because tomosynthesis imaging reduces the effects of overlapping breast tissue, 19 an asymmetry on digital mammography may be identified as benign appearing, overlapping breast tissue on DBT.
Masses are more likely to be recalled on DBT than FFDM (33.1% vs. 17.0%;
There was no statistically significant difference in the recall rate of calcifications on DBT with synthetic reconstructed 2D view only, compared with FFDM, consistent with prior studies.6,17 In addition, there was no significant difference in the likelihood of malignancy.
When reviewing final BI-RADS assessment of the lesion categories, we noted a unique outcome with calcifications. Recalled calcifications were more frequently recommended for short-term follow-up and were more commonly recommended for biopsy compared to other lesion types to confirm a benign etiology. Calcifications were also the most frequent finding that led to biopsy with the outcome of a benign, high-risk lesion. These findings suggest that calcifications, while frequently benign, present an interpretive challenge to breast imagers. As calcifications are more frequently DCIS than invasive cancer, and even more frequently are benign, consideration may be made for less aggressive patterns of recall and assessment following diagnostic mammogram evaluation.
In alignment with prior studies,9,20,21 our data demonstrate a statistically significant lower recall rate for DBT screening mammogram compared with FFDM: 8% versus 10.6%, respectively. However, in contrast to prior studies,20,22–24 the difference in PPV1 between DBT and FFDM screening exams was not statistically significant, nor was the difference in the false-positive rate. The PPV1 for all screening mammograms was 5.7%, within the accepted range of screening mammography performance (3–8%), 16 and the false-positive rate of all screening exams was 8.1%, similar to prior published rates (3.3–9.3%).8,25
Our study has several limitations. First, this was a nonrandomized, retrospective review. Women self-selected to have DBT or FFDM screening, which introduced the potential for allocation bias. The self-selection process also resulted in a large difference in cohort size, with 5029 screening mammograms performed using FFDM compared with 17,026 performed with DBT. Subsequent sublevel analysis was limited by smaller patient numbers. Secondly, some patients were excluded from the analysis due to incomplete documentation or loss to follow-up. In general, recalled lesion type and false-positive cases should be randomly distributed among those lost to follow-up and those for whom we have full imaging follow-up or histologic diagnosis. Nonetheless, our estimated worst-case scenario calculations still show our data to be in the standard accepted ranges and to be consistent with prior research. Third, our study was performed at a large academic institution; thus, our results may not be generalizable to other practice types. Fourth, the use of the terms “asymmetry” and “mass” may be different on FFDM compared to DBT, potentially impacting our results. This would be the case for all such studies comparing these two modalities.
Conclusions
DBT provides an advantage to breast imagers attempting to reduce recalls, particularly in the setting of architectural distortion and asymmetries. Calcifications pose a particularly difficult diagnostic challenge to the breast imager, with increased recommendations for short-term follow-up and biopsy for false-positive calcifications. Our data demonstrate that we can learn from these outcomes. Architectural distortions may need to be treated with greater suspicion when seen on DBT, as they are more likely to represent an invasive cancer. Calcifications may warrant less aggressive recall and management based on the increased likelihood of a benign outcome.
Footnotes
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Tali Amir, Emily Ambinder, Eniola Oluyemi, Mary Kate Jones, Evan Honig, and Matthew Alvin have no conflicts of interest. Susan Harvey is employed by Hologic, Inc. Lisa Mullen received grant support from IBM Research, Cepheid, and the Mark Foundation for projects unrelated to this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
