Sage Journals: Discover world-class research

Abstract

Objective

To compare outcome metrics of digital breast tomosynthesis (DBT) breast cancer screening with full-field digital mammogram (FFDM); specifically, to compare recall rates by the type of recalled finding, and to assess if screening with DBT versus FFDM changes biopsy recommendations and if the likelihood of malignancy varied by lesion type, if detected on DBT or FFDM screening mammogram.

Methods

The outcomes of 22,055 FFDM and DBT screening mammograms were retrospectively reviewed. The exams were performed at an academic institution between August 2015 and September 2016. Performance of screening with FFDM versus DBT was compared in terms of recall rate and percentage of recalled lesions resulting in a cancer diagnosis, with subset analyses performed for specific mammographic findings.

Results

The recall rate was 10.6% for FFDM and 8.0% for DBT (p < 0.001). Architectural distortion was more likely to be recalled on DBT screening than FFDM (p = 0.002), and was associated with an increased likelihood of malignancy (p = 0.008). Asymmetries were less likely to be recalled on DBT than FFDM (p < 0.001) screening mammogram, but more likely to be recommended for biopsy when detected on DBT. Calcifications more frequently required short-term follow-up or biopsy on both DBT and FFDM.

Conclusions

DBT screening confers an advantage in detection of architectural distortion representing malignancy. Recall rate of asymmetries are reduced with screening DBT, probably due to reduction of tissue superimposition. Calcifications pose a particularly difficult diagnostic challenge for breast imagers, regardless of screening mammogram type.

Keywords

Breast cancer screening digital breast tomosynthesis mammography

Introduction

Breast cancer is the most common malignancy affecting women worldwide.¹ The American Cancer Society projects that over 276,000 new breast cancer cases and over 42,000 deaths will occur in 2020 in the United States.² Multiple studies have demonstrated that breast cancer screening programs reduce breast cancer mortality by 19–49%.^3–5 Screening mammography with digital breast tomosynthesis (DBT) is associated with improved cancer detection rates,⁶ decreased false-positive rates,^7–11 and reduced recalls¹² compared to full-field digital mammogram (FFDM). Additionally, invasive cancers detected on screening DBT may be smaller and more likely to be node negative compared with those detected by digital mammography, especially in younger women,^13–15 suggesting cancers detected on DBT portend a better prognosis.

There are four categories of findings, outlined by the Breast Imaging Reporting and Data System (BI-RADS),¹⁶ which may be identified and recalled on mammography—asymmetry, mass, distortion, and calcifications. Limited studies have focused on lesion-level analysis of findings recalled on tomosynthesis imaging and have predominantly centered on the role of synthesized two-dimensional (2D) mammography screening with DBT in comparison with digital mammography combined with DBT.^6,17

The goal of our study was to compare outcome metrics of DBT breast cancer screening with FFDM; specifically, to assess whether recall rates vary by the type of recalled finding, if biopsy recommendations differ between screening with DBT versus FFDM, and whether the likelihood of malignancy varied by lesion type if detected on DBT or FFDM screening mammogram. Our overarching aim was to evaluate if patterns exist that may be leveraged to increase true-positive recalls and decrease false-positive recalls.

Materials and methods

This study was approved by the institutional review board (IRB) and was Health Insurance Portability and Accountability Act (HIPAA) compliant. We retrospectively reviewed clinical data from 22,055 FFDM and DBT screening mammograms performed at three outpatient sites at a single academic institution between August 2015 and September 2016. All interpreting radiologists were dedicated breast imagers, with an average of 10.4 years of experience (range 1–22 years). Clinical follow-up was tracked through June 2019.

As previously described,¹² clinical data and imaging findings were recorded using a clinically integrated data collection system, MachForm. Additional clinical data was reviewed in the electronic medical record (Epic, Madison, WI). For each patient recalled, we obtained the date of the screening mammogram, the modality of the screening mammogram (DBT vs. FFDM), breast tissue density (almost entirely fatty, scattered areas of fibroglandular tissue, heterogeneously dense tissue, or extremely dense tissue), the lesion type (calcifications, asymmetry, mass, or architectural distortion) based on the 5th Edition BI-RADS lexicon, number of lesions per patient, follow-up imaging results, and pathology results (if the lesion was biopsied).

Outcomes of the recalled cases were categorized as either false positive or true positive. A false positive was defined as BI-RADS 1 or 2 assessment on diagnostic breast imaging evaluation with two years of cancer-free follow-up, BI-RADS 3 assessment on diagnostic imaging evaluation with two years of cancer-free follow-up, or any benign pathology from biopsy. True-positive cases were defined as any case with malignant pathology on core biopsy or surgical excision within one year.¹² Breast cancer was defined as invasive ductal cancer, invasive lobular cancer, or ductal carcinoma in situ (DCIS). Positive predictive value (PPV1) was defined as the percentage of positive screening examinations (BI-RADS 0) that result in a tissue diagnosis of cancer within one year.¹⁶

All screening mammograms were performed on either a FFDM or DBT mammography unit (Selenia and Dimensions, Hologic, Marlborough, MA). All patients were offered DBT screening, and self-selected if they preferred to have DBT or FFDM screening. Mammograms performed with DBT included a synthesized 2D image generated from the tomosynthesis data. FFDM views were not obtained in combination with DBT screening. Screening mammograms were interpreted on a dedicated workstation (Hologic, SecurView, Marlborough, MA).

Statistical analysis was performed using the computing software program R 2019 (R Foundation for Statistical Computing, Vienna, Austria).¹⁸ The overall performance of screening with FFDM versus DBT was compared in terms of the recall rate and estimated PPV1 using the Chi-squared test. Subset analysis evaluation of the recall rate, final BI-RADS assessments, and pathology results were compared for calcifications, asymmetries, masses, and distortions, as well as for each of the four breast tissue density categories, between the FFDM and DBT groups using the Chi-squared test and Fisher exact test. A p < 0.05 was considered statistically significant. Cases lost to follow-up resulted in an incomplete data set, and therefore an estimated PPV1 and false-positive rate were calculated, with the assumption that all of the cases lost to follow-up were benign and thus were false positives.

Results

Study dataset

A total of 22,055 screening mammograms were performed at our institution between August 2015 and September 2016. 5029 screening mammograms were performed using FFDM and 17,026 were performed using DBT. 1887 were given a BI-RADS 0 assessment. The overall recall rate was 8.6%; 10.6% for FFDM (531/5029) and 8% (1356/17026) for DBT (p < 0.001). The estimated overall PPV1 was 5.7% (107/1887); the PPV1 for FFDM was 5.1% (27/531) compared with 5.9% for DBT (80/1356) (p = 0.12). The estimated overall false-positive rate was 8.1% (1782/22,055); 10.0% (505/5029) for FFDM; and 7.5% (1277/17,026) for DBT.

24-month follow-up and/or pathology data were available for 1338 patients with 1597 lesions (226 patients had two lesions identified for recall and 33 had three lesions identified for recall). Thirty-three lesions (32 lymph nodes and 1 dilated duct) were excluded from the analysis due to the small data size, with 1564 lesions included in our final analysis. Results are shown in Tables 1 and 2.

Table 1.

Outcome metrics for FFDM versus DBT screening stratified by finding type. FFDM: full-field digital mammogram; DBT: digital breast tomosynthesis; BI-RADS: Breast Imaging Reporting and Data System.

Outcome metric	Finding	FFDMn = n (%)	DBTn = n (%)	p-Value (χ² test)
Recall rate	Calcification	63 (13.6%)	189 (17.2%)	0.09
	Asymmetry	296 (63.8%)	428 (38.9%)	<0.001
	Mass	79 (17.0%)	364 (33.1%)	<0.001
	Distortion	26 (5.6%)	119 (10.8%)	0.002
Diagnostic assessment	Calcifications
	BI-RADS 1 or 2	14 (22.2%)	60 (31.8%)	0.18
	BI-RADS 3	17 (27.0%)	57 (30.2%)
	BI-RADS 4 or 5	32 (50.8%)	72 (38.1%)
	Asymmetry
	BI-RADS 1 or 2	231 (78.0%)	297 (69.4%)	0.04
	BI-RADS 3	37 (12.5%)	76 (17.8%)
	BI-RADS 4 or 5	28 (9.5%)	55 (12.8%)
	Mass
	BI-RADS 1 or 2	53 (67.1%)	237 (65.1%)	0.95
	BI-RADS 3	11 (13.9%)	54 (14.8%)
	BI-RADS 4 or 5	15 (19.0%)	73 (20.1%)
	Distortion
	BI-RADS 1 or 2	22 (84.6%)	66 (55.5%)	0.02^a
	BI-RADS 3	1 (3.9%)	17 (14.3%)
	BI-RADS 4 or 5	3 (11.5%)	36 (30.3%)
Pathology results	Calcifications
	Benign	18 (56.3%)	45 (62.5%)	0. 71^a
	High risk	5 (15.6%)	8 (11.1%)
	Malignant	9 (28.1%)	19 (26.4%)
	Asymmetry
	Benign	16 (57.1%)	29 (52.7%)	0.42^a
	High risk	0 (0%)	4 (7.3%)
	Malignant	12 (42.9%)	22 (40.0%)
	Mass
	Benign	9 (60.0%)	50 (68.5%)	0.68^a
	High risk	1 (6.7%)	3 (4.1%)
	Malignant	5 (33.3%)	20 (27.4%)
	Distortion
	Benign	3 (100%)	8 (22.2%)
	High risk	0 (0%)	3 (8.3%)
	Malignant	0 (0%)	25 (69.4%)	0.02^a
Likelihood of malignancy of recalled finding	Calcification	9 (14.3%)	19 (10.1%)	0.49
	Asymmetry	12 (4.1%)	22 (5.1%)	0.62
	Mass	5 (6.3%)	20 (5.5%)	0.79^a
	Distortion	0 (0%)	25 (21.0%)	0.008^a

Fisher’s test performed due to small sample size.

Table 2.

Patients excluded from analysis.

Reason for exclusion	# Excluded
Incomplete data entry	448
Technical recall	3
Prior exams became available, obviating the need for recall	5
Lost to follow-up after BI-RADS 0 assessment	26
Lost to follow-up after BI-RADS 3 assessment	54
Lost to follow-up after BI-RADS 4 assessment	10
Passed away from causes unrelated to breast cancer	3
Total	548

Screening

A total of 1564 lesions were included in the final analysis: 724 asymmetries (46.3%), 443 masses (28.3%), 252 calcifications (16.1%), and 145 architectural distortions (9.3%). The overall distribution of findings recalled from FFDM versus DBT was statistically significant (p < 0.001) (Figure 1). Asymmetries were more likely to be recalled on FFDM than on DBT, representing 63.8% (296/464) versus 38.9% (428/1100) of recalled lesions, respectively (p < 0.001). Conversely, masses and architectural distortions were more likely to be recalled on DBT. Masses represented 33.1% (364/1100) of recalled lesions on DBT versus 17% (79/464) on FFDM (p < 0.001), and distortions represented 10.8% (119/1100) of recalled lesions on DBT compared with 5.6% (26/464) on FFDM (p = 0.002). The distribution of recalls for calcifications did not differ significantly between the two screening modalities (p = 0.10). There was no statistical significance in the overall distribution of recalled findings on FFDM and DBT screening between dense and nondense breasts (p = 0.07).

Figure 1.

Recalled mammographic findings: FFDM versus DBT.

Diagnostic

Final BI-RADS assessments after complete diagnostic evaluations are shown in Table 1. When an asymmetry or architectural distortion was identified on DBT screening mammogram, it was more likely to be assessed as a BI-RADS 4 or 5 and recommended for biopsy on diagnostic workup than when detected on FFDM screening (p = 0.04 and p = 0.02, respectively). There was no significant statistical difference in final BI-RADS distribution of masses (p = 0.95) and calcifications (p = 0.18) detected on screening by FFDM compared with DBT. False-positive calcifications were most frequently recommended for short-term follow-up in order to confirm they were benign, and were more commonly recommended for biopsy compared to other lesion types. Calcifications were also the most frequent finding that led to biopsy with the outcome of a benign, high-risk lesion.

Prior to obtaining DBT-guided biopsy capability in 2017, our breast imagers encountered a clinical dilemma when faced with cases of architectural distortion only visible on DBT images, but not on 2D images or ultrasound. Eighteen patients who had a finding described as architectural distortion on screening exam were given a BI-RADS 3 assessment at diagnostic exam. Only 12 of these were described as architectural distortion on the diagnostic exam (the remaining cases were described as an asymmetry or resolved). Five of the twelve cases (42%) were thought to be post-surgical or post biopsy change, and a six-month follow-up was recommended to confirm stability, with no significant change at the follow-up exam. The remaining seven cases (58%) were either described as very subtle or stable compared to multiple prior exams. Due to the lack of tomosynthesis-guided biopsy capability and/or low suspicion for malignancy, six-month follow-up was recommended. Four of these cases were ultimately biopsied due to interval change in appearance (three yielding complex sclerosing lesion or radial scar and one yielding infiltrating lobular carcinoma, noted below).

Pathology

All findings described as BI-RADS 4 or 5 underwent ultrasound or stereotactic biopsy, and pathology results were classified as benign, high risk or malignant. In total, 112 of the 1564 lesions were found to represent breast cancers (7.2%). The likelihood of malignancy was greatest for recalled distortions (25/145, 17.2%), followed by calcifications (28/252, 11.1%), masses (25/443, 5.6%) and lowest for asymmetries (34/724, 4.7%) (p < 0.001) (Figure 2).

Figure 2.

Likelihood of malignancy: FFDM versus DBT.

Stratified results for FFDM versus DBT screening are shown in Table 1. Biopsy of architectural distortion originally identified on DBT screening mammogram more often resulted in malignant pathology (69.4% of biopsy recommendations and 21.0% of recalled cases) when compared to those identified on FFDM screening (0%) (p = 0.02 and p = 0.008, respectively). Note that one architectural distortion initially assessed as probably benign (BI-RADS 3) was recommended for biopsy at the one-year follow-up appointment, and resulted in a cancer diagnosis.

There were 84 diagnoses of invasive cancers and 28 diagnoses of DCIS. As expected, the distribution of cancer (invasive vs. in situ) was different based on the lesion type biopsied (p < 0.001). DCIS was most often detected in the setting of calcifications (21/28; 75%), while all the other lesion types were more likely to result in invasive cancer diagnoses.

Discussion

Our data demonstrate that recall outcomes of tomosynthesis and digital screening mammogram vary by the type of recalled finding. DBT confers its greatest advantage in detection of architectural distortion representing cancer. Architectural distortion is more likely to be recalled on tomosynthesis screening mammogram than on 2D digital mammogram (10.8% vs. 5.6%; p = 0.002), and more likely to be recommended for biopsy when detected on tomosynthesis imaging (29.4% vs. 11.5%; p = 0.02). Importantly, when detected on tomosynthesis, distortions are associated with a greater likelihood of malignancy than those on 2D mammogram (21.0% vs. 0%, p = 0.02). Tomosynthesis uses multiple in-plane image acquisitions, which are reconstructed into 1 mm slices, reducing the effects of superimposed fibroglandular tissue. As a result, architectural distortion appears more conspicuous on DBT than FFDM.¹⁹

In contrast, asymmetries are less likely to be recalled on DBT compared to FFDM. However, when asymmetries are recalled on DBT screening mammogram, they are more likely to be recommended for biopsy on subsequent diagnostic evaluation than those detected on 2D-digital screening mammogram. There is no difference in the likelihood that a biopsied asymmetry will yield malignancy based upon the initial screening mammogram type. Because tomosynthesis imaging reduces the effects of overlapping breast tissue,¹⁹ an asymmetry on digital mammography may be identified as benign appearing, overlapping breast tissue on DBT.

Masses are more likely to be recalled on DBT than FFDM (33.1% vs. 17.0%; p < 0.001) but do not result in a statistically significant increased rate of biopsy recommendation and are not associated with an increased likelihood of malignancy. These results indicate that although there is increased conspicuity of masses on DBT, many masses detected on DBT represent benign masses, such as cysts. Of note, a finding that appears as an asymmetry on FFDM may appear as a mass on DBT, due to the lack of superimposed tissue. In this scenario, these recalled findings would fall under the category of “mass” in our DBT data set, and “asymmetry” in our FFDM data set, thus changing the numbers in lesion categories. This is supported by the fact that more masses were recalled in the DBT group.

There was no statistically significant difference in the recall rate of calcifications on DBT with synthetic reconstructed 2D view only, compared with FFDM, consistent with prior studies.^6,17 In addition, there was no significant difference in the likelihood of malignancy.

When reviewing final BI-RADS assessment of the lesion categories, we noted a unique outcome with calcifications. Recalled calcifications were more frequently recommended for short-term follow-up and were more commonly recommended for biopsy compared to other lesion types to confirm a benign etiology. Calcifications were also the most frequent finding that led to biopsy with the outcome of a benign, high-risk lesion. These findings suggest that calcifications, while frequently benign, present an interpretive challenge to breast imagers. As calcifications are more frequently DCIS than invasive cancer, and even more frequently are benign, consideration may be made for less aggressive patterns of recall and assessment following diagnostic mammogram evaluation.

In alignment with prior studies,^9,20,21 our data demonstrate a statistically significant lower recall rate for DBT screening mammogram compared with FFDM: 8% versus 10.6%, respectively. However, in contrast to prior studies,^20,22–24 the difference in PPV1 between DBT and FFDM screening exams was not statistically significant, nor was the difference in the false-positive rate. The PPV1 for all screening mammograms was 5.7%, within the accepted range of screening mammography performance (3–8%),¹⁶ and the false-positive rate of all screening exams was 8.1%, similar to prior published rates (3.3–9.3%).^8,25

Our study has several limitations. First, this was a nonrandomized, retrospective review. Women self-selected to have DBT or FFDM screening, which introduced the potential for allocation bias. The self-selection process also resulted in a large difference in cohort size, with 5029 screening mammograms performed using FFDM compared with 17,026 performed with DBT. Subsequent sublevel analysis was limited by smaller patient numbers. Secondly, some patients were excluded from the analysis due to incomplete documentation or loss to follow-up. In general, recalled lesion type and false-positive cases should be randomly distributed among those lost to follow-up and those for whom we have full imaging follow-up or histologic diagnosis. Nonetheless, our estimated worst-case scenario calculations still show our data to be in the standard accepted ranges and to be consistent with prior research. Third, our study was performed at a large academic institution; thus, our results may not be generalizable to other practice types. Fourth, the use of the terms “asymmetry” and “mass” may be different on FFDM compared to DBT, potentially impacting our results. This would be the case for all such studies comparing these two modalities.

Conclusions

DBT provides an advantage to breast imagers attempting to reduce recalls, particularly in the setting of architectural distortion and asymmetries. Calcifications pose a particularly difficult diagnostic challenge to the breast imager, with increased recommendations for short-term follow-up and biopsy for false-positive calcifications. Our data demonstrate that we can learn from these outcomes. Architectural distortions may need to be treated with greater suspicion when seen on DBT, as they are more likely to represent an invasive cancer. Calcifications may warrant less aggressive recall and management based on the increased likelihood of a benign outcome.

Footnotes

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Tali Amir, Emily Ambinder, Eniola Oluyemi, Mary Kate Jones, Evan Honig, and Matthew Alvin have no conflicts of interest. Susan Harvey is employed by Hologic, Inc. Lisa Mullen received grant support from IBM Research, Cepheid, and the Mark Foundation for projects unrelated to this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Tali Amir

References

Bray

Ferlay

Soerjomataram

, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018; 68: 394–424.

Siegel

Miller

Jemal

Cancer statistics, 2020. CA Cancer J Clin 2020; 70: 7–30.

Lauby-Secretan

Scoccianti

Loomis

, et al. Breast-cancer screening–viewpoint of the IARC working group. N Engl J Med 2015; 372: 2353–2358.

Nyström

Bjurstam

Jonsson

, et al. Reduced breast cancer mortality after 20+ years of follow-up in the Swedish randomized controlled mammography trials in Malmö, Stockholm, and Göteborg. J Med Screen 2017; 24: 34–42.

Tabár

Dean

Chen

, et al. The incidence of fatal breast cancer measures the increased effectiveness of therapy in women participating in mammography screening. Cancer 2019; 125: 515–523.

Ambinder

Harvey

Panigrahi

, et al. Synthesized mammography: The new standard of care when screening for breast cancer with digital breast tomosynthesis? Acad Radiol 2018; 25: 973–976.

Houssami

Miglioretti

DL.

Digital breast tomosynthesis: A brave new world of mammography screening. JAMA Oncol 2016; 2: 725–727.

Skaane

Breast cancer screening with digital breast tomosynthesis. Breast Cancer 2017; 24: 32–41.

Friedewald

Rafferty

Rose

, et al. Breast cancer screening using tomosynthesis in combination with digital mammography. JAMA 2014; 311: 2499–2507.

10.

McCarthy

Kontos

Synnestvedt

, et al. Screening outcomes following implementation of digital breast tomosynthesis in a general-population screening program. J Natl Cancer Inst 2014; 106: dju316.

11.

Ciatto

Houssami

Bernardi

, et al. Integration of 3D digital mammography with tomosynthesis for population breast-cancer screening (STORM): a prospective comparison study. Lancet Oncol 2013; 14: 583–589.

12.

Honig

Mullen

Amir

, et al. Factors impacting false positive recall in screening mammography. Acad Radiol 2019; 26: 1505–1512.

13.

Conant

Barlow

Herschorn

, for the Population-based Research Optimizing Screening Through Personalized Regimen (PROSPR) Consortiumet al. Association of digital breast tomosynthesis vs digital mammography with cancer detection and recall rates by age and breast density. JAMA Oncol 2019; 5: 635–642.

14.

Kim

Kang

Shin

, et al. Biologic profiles of invasive breast cancers detected only with digital breast tomosynthesis. Am J Roentgenol 2017; 209: 1411–1418.

15.

Wang

Hardesty

Borgstede

, et al. Breast cancers found with digital breast tomosynthesis: a comparison of pathology and histologic grade. Breast J 2016; 22: 651–656.

16.

D'Orsi

Mendelson

Morris

, et al. ACR BI-RADS atlas: breast imaging reporting and data system. Reston, VA: American College of Radiology

17.

Zuckerman

Maidment

ADA

Weinstein

, et al. Imaging with synthesized 2D mammography: Differences, advantages, and pitfalls compared with digital mammography. Am J Roentgenol 2017; 209: 222–229.

18.

Team RCR. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, 2019.

19.

Conant

EF.

Clinical implementation of digital breast tomosynthesis. Radiol Clin North Am 2014; 52: 499–518.

20.

Rose

Tidwell

Bujnoch

, et al. Implementation of breast tomosynthesis in a routine screening practice: an observational study. Am J Roentgenol 2013; 200: 1401–1408.

21.

Sharpe

Venkataraman

Phillips

, et al. Increased cancer detection rate and variations in the recall rate resulting from implementation of 3D digital breast tomosynthesis into a population-based screening program. Radiology 2016; 278: 698–706.

22.

Giess

Pourjabbar

, et al. Comparing diagnostic performance of digital breast tomosynthesis and full-field digital mammography in a hybrid screening environment. Am J Roentgenol 2017; 209: 929–934.

23.

Haas

Kalra

Geisel

, et al. Comparison of tomosynthesis plus digital mammography and digital mammography alone for breast cancer screening. Radiology 2013; 269: 694–700.

24.

Skaane

Bandos

Gullien

, et al. Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program. Radiology 2013; 267: 47–56.

25.

Mullen

Panigrahi

Hollada

, et al. Strategies for decreasing screening mammography recall rates while maintaining performance metrics. Acad Radiol 2017; 24: 1556–1560.

Benefits of digital breast tomosynthesis: A lesion-level analysis

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Materials and methods

Results

Study dataset

Screening

Diagnostic

Pathology

Discussion

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References