Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Abstract

Introduction

Breast density is a risk factor for breast cancer and reduces the sensitivity of mammography. Manual breast imaging reporting and data system (BI-RADS) classification remains the clinical standard, but automated methods have been developed to improve reproducibility and efficiency. This review evaluated the concordance between automated/semi-automated measurements and manual assessments of mammographic breast density.

Methods

We systematically searched MEDLINE, Embase, Cochrane Database of Systematic Reviews, CENTRAL, Scopus, and Web of Science (2014 onwards) for studies comparing automated or semi-automated measurement with manual BI-RADS classification on 2D digital mammography. Eligible studies included ≥60% of participants from routine screening populations. Data extraction and risk of bias assessment followed a registered protocol (PROSPERO: CRD42024550250).

Results

There is good concordance between automated/semi-automated measurement and manual assessment of breast density in the 26 included studies. Meta-analysis of 13 Volpara studies showed a tendency to classify mammograms as dense compared with manual assessment, but the difference was not statistically significant and statistical heterogeneity was very high (pooled difference 0.03, 95% CI −0.03 to 0.10; I² = 98%). Studies of Quantra and other software showed broadly similar findings, but variability in software versions and BI-RADS editions limited comparability. Reporting of participant demographics was poor, thus generalisability is unclear.

Conclusions

Automated breast density software, such as Volpara and Quantra, shows promising concordance with manual BI-RADS assessment and may enhance consistency in screening programmes. Heterogeneity across studies and limited information on representativeness preclude firm conclusions. Large-scale, standardised, and inclusive evaluations are needed to establish clinical utility.

Funding

National Institute for Health and Care Research

Keywords

Breast density automated/semi-automated concordance BI-RADS breast cancer screening mammography

Introduction

In the UK, breast cancer is the most common type of cancer among women, accounting for 15% of all new cancer cases. Based on data from 2016 to 2018, there are approximately 56,000 new breast cancer cases in the UK annually, corresponding to more than 150 per day.¹ The UK breast cancer screening programme currently screens all women aged 50–70 years at 3-year intervals with mammography. Although breast cancer screening is highly successful in preventing breast cancer mortality (20-40% reduction in risk),^2–4 breast cancer deaths are still not prevented in a substantial proportion of people due to underdiagnosis.⁵

In the context of screening, breast density is of concern because women with high breast density have an increased risk of breast cancer compared to those with low breast density,⁶ and the sensitivity of mammography screening is lower in women with denser breasts.⁷ In clinical practice, the biologic continuum for breast density is categorised into four groups according to the American College of Radiology breast imaging reporting and data system (ACR BI-RADS) atlas,⁶ which is used as a clinical reference, with ‘A’ referring to breasts that are entirely fatty, ‘B’ referring to breasts with scattered areas of fibroglandular density, ‘C’ referring to breasts that are heterogeneously dense, and ‘D’ referring to extremely dense breasts. Women with extremely dense and moderately dense breasts (BI-RADS groups D and C) are at particular risk of underdiagnosis and may account for almost half of the screening population.⁸

In clinical practice, breast density assessment has traditionally been conducted by subjective manual assessment where the radiologist inspects mammograms to categorise the density of the breast. More recently, automated and semi-automated quantitative methods of breast density assessments have been developed to improve the reproducibility of breast density assessment, and these may improve workflow efficiency.⁹

Automated software systems such as Volpara and Quantra provide density measurements that correspond to the four BI-RADS density categories, but a review by Patterson et al. (2019) for the UK National Screening Committee (UK NSC) found that, while the test-retest reliability of automated methods was good, and reliability was better than human readers, there was a paucity of high-quality evidence and the concordance (agreement) between automated methods was variable.¹⁰ They concluded that automated methods cannot be used interchangeably to measure breast density. Since there is yet to be a comprehensive systematic review that evaluates concordance between automated and manual methods, the objective of this review was to determine the agreement (concordance) between automated/semi-automated measurement and manual assessment of mammographic breast density.

Methods

This systematic review was commissioned by the UK NSC via the UK National Institute for Health and Care Research (NIHR) and was conducted in accordance with the recommendations of the Cochrane Handbook for Systematic Reviews of Interventions¹¹ and reported in adherence with the Preferred Reporting Items for Systematic Reviews guidelines.¹² The methods were pre-specified in a protocol and registered with the PROSPERO International Prospective Register of Systematic Reviews, available from: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42024550250.

Two patient and public involvement (PPI) partners were part of the project's Advisory Group, which also included academic and clinical experts. One PPI partner has lived experience of undergoing mammography for routine breast screening, and the other has lived experience of breast cancer and undergoing mammography. PPI partners contributed to regular Advisory Group discussions and made recommendations at each stage of the project.

Most people who use the UK's breast screening programme identify as women, though not all do. While using exclusively gender-neutral language can enhance inclusivity, it may also reduce clarity. None of the studies included in our review reported data on non-binary participants. We have, therefore, chosen to use both ‘women’ and gender-neutral language where appropriate. We acknowledge that this is a compromise; however, when we refer to ‘women’, we ask readers to interpret this as including all individuals who use the breast screening service, not only those who identify as women.

Search strategy

Comprehensive search strategies were developed by an information scientist (PM) with input from expert advisors to identify studies of any design that compared manual assessment and automated/semi-automated measurement of breast density. The databases searched were MEDLINE, Embase, Cochrane Database of Systematic Reviews, CENTRAL, Scopus, and Web of Science. There were no restrictions on study type or language at the search stage, but results were limited to articles published in 2014 onwards. This was chosen as the start date to ensure the search was comprehensive and captured recent technological advances. All references were exported to endnote for recording and deduplication. The reference lists of all articles selected for full-text appraisal were screened for additional studies. Details of the search strategies are reported in Appendix 1 (see online Supplementary Material).

Study selection

Full-text articles of published studies were eligible for inclusion if they reported the agreement between automated or semi-automated measurement of breast density and manual (visual) assessment of breast density using the BI-RADS breast density scoring system⁶ (editions 3,¹³ 4,¹⁴ or 5)⁷ for 2D digital mammography or they reported the resources required to measure breast density for the two methods, and included at least 60% of participants in their sample who underwent mammography for routine breast cancer screening and had no prior history of breast cancer. Studies of synthetic or spectral mammography were deemed ineligible because 2D digital mammography is the standard imaging modality used in the UK National Breast Screening Programme. We aimed to include mammograms from participants who are representative of the UK general screening population, rather than those undergoing mammography for diagnostic indications or for surveillance for second primary or recurrent breast cancer. Case–control studies, systematic reviews, editorials, conference abstracts, letters, and opinion articles were not eligible for the review.

Two reviewers (CR and SD) independently screened 20% of the titles and abstracts to ensure consistency by comparing their results. The remaining citations were screened by a single reviewer (CR). All potentially relevant full-text articles were retrieved and assessed for inclusion by one reviewer (CR), with a second reviewer (MB, DC or SD) checking all articles labelled as unclear (20%). Any disagreements were resolved through discussion between reviewers.

We attempted to contact the corresponding authors of studies where details of the study population were unclear, provided the study included at least 200 participants (or 200 mammograms if the number of participants was not reported). Due to time constraints, we excluded studies with unclear population details that had fewer than 200 participants or 200 mammograms without attempting to contact the study authors.

Data extraction and risk of bias assessment

A single reviewer (CR) conducted data extraction using a pre-specified data extraction form that was developed with input from the Advisory Group and in accordance with guidance from the PRO-EDI initiative¹⁵ for considering equality, diversity, and inclusion of participant characteristics in evidence syntheses. The same reviewer conducted risk of bias assessment using an adapted version of the review body for interventional procedures (ReBIP) quality assessment tool for non-randomised comparative and case series studies. The ReBIP 18-item checklist was originally developed for National Institute for Health and Care Excellence and was adapted from several quality assessment checklists and guidance documents, including the National Health Service (NHS) Centre for Reviews and Dissemination's guidance, Verhagen and colleagues, Downs and Black, and the generic appraisal tool for epidemiology.^16–19 The tool assesses bias and generalisability, sample definition and selection, description of the intervention, outcome assessment, adequacy of follow-up, and performance of the analysis. Individual ReBIP question items 1 to 12 were rated as ‘yes’, ‘no’ or ‘unclear’. A rating of ‘yes’ denoted the optimal rating for methodological quality. Items 13 to 18 of the checklist were considered unsuitable for the scope of the current review. MI conducted a 20% check of the data extraction and risk of bias assessments.

Data analysis

Investigators have used a range of statistical methods to assess agreement between manual assessment and automated/semi-automated measurement including the area under the curve, measure of diagnostic accuracy. These include sensitivity or specificity, Spearman's rank correlation coefficient, Pearson's correlation coefficient, and the kappa statistics. The choice of methods was at the discretion of each study's authors. Among studies reporting the kappa statistics, there was variation in whether the kappa was weighted and, if so, whether linear or quadratic weights were applied. In certain cases, only the kappa value was provided with no accompanying indication of precision. Furthermore, the basis of agreement differed across studies. In some, the agreement was assessed for binary classifications (dense versus non-dense), while in others it was related to the four density categories.

Where studies reported the proportions of participants classified into breast density categories by both automated/semi-automated and manual methods, we analysed these proportions for studies using similar automated/semi-automated software to determine the ability of the software to consistently classify participants as having dense or non-dense breasts compared with manual assessment. For studies reporting numerical data for both automated/semi-automated and manual methods, we conducted a random effects meta-analysis to compare the proportions classified as dense and non-dense. This analysis was possible for studies comparing Volpara to manual assessment. Among these studies, there were four multi-arm studies. To avoid potential bias, we did not split the Volpara group when there were multiple control groups, nor did we combine control groups when these represented different manual assessments of the same participants. For the main meta-analysis, we selected the more recent version of Volpara when two were available, e.g., Gemici et al. (2020),²⁰ and the findings from breast imaging experts in Eom et al. (2018),²¹ randomly selected observer 1 in Singh et al. (2016),²² and the most experienced radiologist in Rigaud et al. (2022).²³ Sensitivity analyses were also conducted using the excluded groups from the Gemici et al. (2020), Eom et al. (2018), and Singh et al. (2016) studies.

For each study, the proportion classed as dense from using Volpara was compared with the proportion classed as dense by manual measurement. The difference in proportion and the associated standard error is used in a random effects meta-analysis using restricted maximum likelihood. Heterogeneity was assessed using the I² statistic. The meta-analysis was conducted using the meta suite of Stata19.²⁴

Results

The literature searches identified 1032 citations, and 215 full-text reports were selected for eligibility assessment. Five reports were unavailable. We attempted to contact the corresponding authors of 28 reports where it was unclear whether the report included: (a) eligible population (n = 25),^25–49 (b) eligible mammography (n = 2),^50,51 or (c) was a secondary report of a participant sample in another included study (n = 1).⁵² We received replies from five authors and were subsequently able to include two studies.^28,51

We excluded 11 studies^53–63 that used mammograms from databases of digitised film mammography, specifically the digital database of screening mammography⁶⁴ and the mammographic image analysis society databases.⁶⁵

In total, 26 reports were included. Details of the screening process are presented in Figure 1 and the list of excluded studies with the main reasons for exclusion is presented in Appendix 2.

Figure 1.

PRISMA flow diagram of the screening process. PRISMA: Preferred Reporting Items for Systematic Reviews.

Characteristics of the included studies

The characteristics of the included studies are detailed in Appendix 3. The included studies were conducted in Europe (n = 9: Sweden [n = 5],^51,66–69 the Netherlands [n = 2],^70,71 France [n = 1],⁷² Norway [n = 1]);⁷³ the USA (n = 4);^23,28,74,75 the Republic of Korea (n = 4);^21,76–78 Peru (n = 2);^79,80 and one study each was conducted in Argentina,⁸¹ Australia,⁸² Brazil,⁸³ India,²² Saudia Arabia,⁸⁴ and Turkey.²⁰ Finally, one study was done in both the UK and USA.⁸⁵

The studies varied in whether they reported their units of analysis as the number of participants (total across studies, n = 29,784) or the number of mammograms (total across studies, n = 16,194). The youngest reported mean age of participants was 48.8 years²² and the oldest was 58.8 years.⁷⁰ Only one study reported the ethnicity of the participants.⁷⁸ None of the studies reported details of the socioeconomic status or transgender characteristics of the participants.

Five studies^51,66–69 obtained mammograms from the Mälmo breast tomosynthesis screening trial.⁸⁶ It is unclear whether any participant overlap exists between these studies, but each study evaluated different automated software. Similarly, while it is unclear whether there is participant overlap between two studies conducted in Peru, both studies evaluated different automated software. We believe there are no concerns about duplication in these studies. Four studies explicitly reported evaluating raw (for processing/pre-processed) images;^66–68^,79 however, most studies did not specify whether they evaluated raw or processed (for presentation) images.

Overall, the included studies were of moderate quality even though many items of the ReBIP checklist items were rated as unclear due to insufficient reporting by study authors. The results of the study-level quality assessment are provided in Appendix 4. The authors of 11 studies either developed the automated software or had associations with the manufacturers of the automated software (see Appendix 3 for details).^51,66–71^,74,79,80,83

Overall concordance findings

Full-length results tables are provided in Appendix 5. Eight studies evaluated Volpara software,^{20,21,68,70,75,76,84} two studies evaluated Quantra,^73,82 two evaluated Volpara and Quantra in the same study,^71,77 and three studies evaluated Volpara and other automated/semi-automated algorithms (Cumulus Hand Delineation and ImageJ software,⁸⁵ EfficientNetB0 deep learning software²³ and AI-CAD Lunit INSIGHT MMG).⁷⁸ The findings of these studies are reported here. The other 11 studies evaluated various other individual automated software systems, and because of this heterogeneity their findings are not reported here but are summarised in Appendix 6.^{28,51,66,67,69,72,74,79–81}^,83

In some studies, more than one agreement statistic was used. Across all the related studies, three general types of agreement were reported: kappa statistics, correlation coefficients, and percentage agreement. Nearly 70% of the reported methods are kappa with over 80% either linear or quadratic weighted Cohen's kappa. All studies included participants with a full range of breast density categories, so low prevalence in either dense or non-dense classifications was not an issue. Even in studies using a four-level density agreement – where categories A and D had fewer participants – there were sufficient participants in categories B and C to avoid the need for a prevalence-adjusted Kappa. We are, therefore, satisfied that the studies used appropriate methods to measure agreement for ordinal rating categories between different reviewers. Where studies had more than two reviewers, Fleis Kappa was used to account for multiple raters.

Volpara software versus manual assessment

The results of the 13 studies that compared Volpara software with manual density assessment using BI-RADs are summarised in Table 1.^20–23^{,68,70,71,75–78,84,85} The studies evaluated 15 versions of Volpara compared against BI-RADS 4th and BI-RADS 5th editions. Two studies did not report the BI-RADS edition.^21,23 The different versions of Volpara and BI-RADS editions were considered as suitably similar for combining in our meta-analysis. The studies evaluated mammograms obtained using Hologic (n = 5);^{20,23,70,71,77} GE Healthcare (n = 4),^21,75,76,78 Siemens AG (n = 1),⁶⁸ and Phillips (n = 1)²² systems. Two studies did not report the mammography system.^84,85

Table 1.

Summary of results of the studies evaluating Volpara versus manual assessment of mammographic breast density.

Study ID	Sample size used in the analysis	Type of mammography	Automated software version and BI-RADS edition	Agreement value (95% CI)	Agreement interpretation^a
Gemici 2020²⁰	379 mammograms	FFDM Selenia, Hologic	Volpara Version 1.4.2 (n = 1399 mammograms)	Non-dense vs dense V 1.4.2 vs BI-RADS 4: κ −0.41 (NR)	Poor agreement
			Volpara Version 1.5.1 (n = 1399 mammograms)	V1.5.1 vs BI-RADS 4: κ −0.40 (NR)	Poor agreement
			BI-RADS 4^th edition (n = 379 mammograms) Two radiologists with 5 and 8 years' experience
Holland 2016⁷⁰	500 participants; 1000 mammograms	Digital mammography Lorad Selenia, Hologic	Volpara Version 1.5.0	4-way density Experienced radiologists: κ 0.73 to 0.78; PhD student: κ 0.77	Good agreement
Holland 2016⁷⁰	500 participants; 1000 mammograms	Digital mammography Lorad Selenia, Hologic	BI-RADS 4^th edition Three radiologists (R1, R2 and R3) with ≥8 years of experience in breast imaging and a PhD student (R4) with a medical degree and 2 years’ experience	Non-dense vs. dense Experienced radiologists: κ 0.63 to 0.70; PhD student: κ 0.71	Good agreement
Rigaud 2022²³	995 participants	FFDM Selenia Dimensions, Hologic	Volpara Version 3.4.1	4-way density κ 0.34 (range 0.17 to 0.48); Average percent agreement: 56% (range 43 to 64%)	Fair agreement
Rigaud 2022²³	995 participants	FFDM Selenia Dimensions, Hologic	BI-RADS (edition NR) 7 radiologists with 5–22 years’ experience		Fair agreement
van der Waal 2015⁷¹	992 mammograms	FFDM Selenia system, Hologic	Volpara Version 1.5.0	κ 0.80 (0.77, 0.82) Proportion agreement 65.4% Prediction of dense category AUC: 0.95 (0.94, 0.96) Accuracy at 8.0% cut-off Sensitivity: 84%; Specificity 91%	Good agreement
van der Waal 2015⁷¹	992 mammograms	FFDM Selenia system, Hologic	BI-RADS 5^th edition 3 experienced radiologists		Good agreement
Youk 2021⁷⁷	4000 participants	FFDM Lorad Selenia, Hologic	Volpara Version 3.1	κ 0.48 (0.46, 0.50)	Moderate agreement
Youk 2021⁷⁷	4000 participants	FFDM Lorad Selenia, Hologic	BI-RADS 5^th edition Three radiologists with 7, 10 and 14 years’ experience	κ 0.48 (0.46, 0.50)	Moderate agreement
Eom 2018²¹	1000 participants	FFDM Senographe DS, GE Healthcare	Volpara Version 1.5.12	Expert radiologists: κ 0.77 (0.75, 0.80); General radiologists: κ 0.71 (0.68, 0.74)	Good agreement
			BI-RADS (edition NR) Two breast imaging experts with >5 years’ experience	Non-dense vs. dense Expert radiologists: κ 0.83 (0.80, 0.87)	Very good agreement
			BI-RADS (edition NR) Two general radiologists with <5 years of experience	General radiologists: κ 0.73 (0.68, 0.77)	Good agreement
Lee 2015⁷⁶	860 participants	FFDM Senographe DS; GE Healthcare	Volpara Version 1.5.1	κ 0.80 (0.77, 0.83) ρ 0.86 (NR) P < 0.0001	Good agreement
Lee 2015⁷⁶	860 participants	FFDM Senographe DS; GE Healthcare	BI-RADS 4^th edition One radiologist with 6 years’ experience	κ 0.80 (0.77, 0.83) ρ 0.86 (NR) P < 0.0001	Good agreement
Lee 2022⁷⁸	488 participants 488 mammograms	FFDM Senographe Pristina, GE Healthcare	Volpara Version 3.4.1	4-way density: κ 0.50 (0.45, 0.56)	Moderate agreement
Lee 2022⁷⁸	488 participants 488 mammograms	FFDM Senographe Pristina, GE Healthcare	BI-RADS 5^th edition Three radiologists with 2-, 10- and 25-years’ experience	Non-dense vs. dense: κ 0.56 (0.48, 0.65)	Moderate agreement
Portnow 2022⁷⁵	200 mammograms	FFDM Senograph ES and Senograph DS, GE Healthcare	Volpara Version 1.5.1	V 1.5.1 vs. BI-RADS 4: κ 0.68 to 0.83^b	Good agreement
			BI-RADS 4^th edition Six radiologists with 23–30 years’ experience	V 1.5.1 vs. BI-RADS 4: κ 0.68 to 0.83^b	Good agreement
			Volpara Version 1.5.2	V 1.5.2 vs. BI-RADS 5: κ 0.76 to 0.85^b	Good agreement
			BI-RADS 5^th edition Six radiologists with 23–30 years’ experience	V 1.5.2 vs. BI-RADS 5: κ 0.76 to 0.85^b	Good agreement
Sartor 2016⁶⁸	8426 participants; 8426 mammograms	Digital mammography, Mammomat Inspiration, Siemens AG	Volpara Version 1.5.11	κ 0.55 (0.53, 0.56)	Moderate agreement
Sartor 2016⁶⁸	8426 participants; 8426 mammograms	Digital mammography, Mammomat Inspiration, Siemens AG	BI-RADS 4^th edition five breast radiologists with >10 years’ experience	κ 0.55 (0.53, 0.56)	Moderate agreement
Singh 2016²²	476 participants	FFDM MicroDose SI, Philips	Volpara Version 1.4.5	Observer 1: κ 0.40 (NR), ρ 0.73	Fair agreement Strong correlation
			BI-RADS 4^th edition Observer 1 (1/2 radiologists with 5–10 years’ experience)	Observer 2: κ 0.39 (NR), ρ 0.73	Fair agreement Strong correlation
			BI-RADS 4^th edition Observer 2 (1/2 radiologists with 5–10 years’ experience)
Alomaim 2020⁸⁵	With distractors: 92 mammograms Without distractors: 158 mammograms	Digital mammography (machine NR)	Volpara Version 1.5.0 (n = 122 mammograms)	κ 0.66, p < 0.001	Good agreement
			BI-RADS 4^th edition (n = 122 mammograms)	With distractors: κ 0.67 (NR) p < 0.001	Good agreement
			25 USA and 24 UK radiologists with >8 years’ experience (UK 12% ≤ 1 years’ experience)	Without distractors: κ 0.52 (NR) p < 0.001	Moderate agreement
Aloufi 2022⁸⁴	1022 participants	NR FFDM was used in the SNBCSP during the time mammography was conducted during the study (from 2012 to 2018)⁸⁷	Volpara Version 1.5.5.1	4-way density κ (0.35 (0.29, 0.39)	Fair agreement
Aloufi 2022⁸⁴	1022 participants		BI-RADS 5^th edition 11 radiologists (experience NR)	Non-dense vs. dense κ 0.53 (0.47, 0.60)	Moderate agreement

Source for the definitions of agreement interpretations: Kappa statistic, Altman (1999);⁸⁸ Spearman's correlation coefficient, Cohen (1988);⁸⁹ it is unclear whether data are reported for four-way density or non-dense versus dense comparison.

AUC: area under the curve; FFDM: full field digital mammography; κ: Kappa statistic; NR: not reported; ρ: Spearman's rank correlation coefficient; SE: standard error; SNBCSP: Saudi National Breast Cancer Screening Programme; V: Volpara; BI-RADS: breast imaging reporting and data system.

Concordance ranged from Kappa −0.40 to 0.83. Most studies (53.8%)^{21,22,70,71,75,76,85} showed good agreement between Volpara and manual assessment with BI-RADS, both for categorising mammograms into the four density and non-dense/dense categories, although one of these studies by Alomaim et al. (2020)⁸⁵ showed only moderate agreement for mammograms that did not contain image distractors.

The study by Eom et al. (2018) found very good agreement between Volpara Version 1.5.12 and visual assessment by expert radiologists for classifying mammograms into dense and non-dense categories, although this reduced to good agreement for measuring the four density categories, and the agreement between Volpara and general radiologists was good for both the four-way and two-way density classifications in this study.²¹

Four studies showed moderate agreement between Volpara and BI-RADS although, of these, the study by Aloufi et al. (2022) found only fair agreement for categorising mammograms into the four density categories compared with moderate agreement for the dense/non-dense categories.⁸⁴ Two studies by Rigaud et al. (2022)²³ and Singh et al. (2016)²² showed only fair agreement and the study by Gemici et al. (2020) showed poor agreement.²⁰ The version of Volpara software, BI-RADS edition, and the type of mammography system used in the studies were not consistently associated with the strength of agreement between the Volpara and visual density assessments.

The meta-analysis comparing the density categorisation of Volpara and manual assessments is shown in Figure 2. This shows a slightly higher categorisation as dense than non-dense from Volpara in comparison to manual classification. In Gemici et al. (2020),²⁰ two versions of Volpara were used and the most recent was used in our meta-analysis. In Eom et al. (2018)²¹ and Singh et al. (2016),²² there were two control groups and we chose breast imaging experts as the control group for Eom et al. (2018) and used observer 1 as the control group for Singh et al. (2016). The sensitivity analyses, using the alternative groups from these three studies, are all consistent with overall differences of 0.03 (95% CI −0.04, 0.10), 0.03 (95% CI −0.03, 0.09), and 0.04 (95% CI −0.03, 0.11) in comparison to the overall difference in the meta-analysis of 0.03 (95% CI −0.03, 0.10). There are no dominant or small studies in the meta-analysis, with the weights being between 7.3% and 9.6%. However, the I² statistic value of 97.8% indicates considerable statistical heterogeneity between the studies.

Figure 2.

Meta-analysis of Volpara versus manual assessment of mammographic breast density. CI: confidence interval.

Quantra software versus manual assessment

The results of the four studies that compared Quantra software with manual assessment are summarised in Table 2. All studies used Selenia (Hologic) mammography systems. Concordance ranged from Kappa 0.54 to 0.84. Two studies by van der Waal et al. (2015)⁷¹ and Ekpo et al. (2016)⁸² showed excellent or very good agreement between Quantra and BI-RADS density assessments, although this reduced to good agreement for the classification of the four density categories in the latter study. The study by Osteras et al. (2016)⁷³ showed good agreement, while the study by Youk et al. (2021)⁷⁷ showed moderate agreement between Quantra and BI-RADS assessments. As with the evaluations of Volpara software, the different Quantra versions and BI-RADS editions were not consistently associated with the strength of agreement between the different methods of density assessment. We were unable to analyse the proportions of mammograms classified by the different density categories because data for both Quantra and BI-RADs assessments were only available for the study by Youk et al. (2021).⁷⁷

Table 2.

Summary of results of the studies evaluating quantra versus manual assessment of mammographic breast density.

Study ID	Sample size used in the analysis	Type of mammography	Automated software version and BI-RADS edition	Agreement value (95% CI)	Agreement interpretation^a
Ekpo 2016⁸²	292 participants (majority report)	NR Selenia Dimensions, Hologic	Quantra Version 2.0	Majority report 4-way density κ 0.79 (0.75, 0.84)	Good agreement
Ekpo 2016⁸²	292 participants (majority report)	NR Selenia Dimensions, Hologic	BI-RADS 4^th edition 3 RANZCR-certified breast radiologists (majority report – consensus of 2/3)	Non-dense vs. dense κ 0.84 (0.79, 0.87) Sensitivity: 91.3%; Specificity 83.6% AUC: 0.89 (0.82, 0.91)	Very good agreement
Osteras 2016⁷³	537 mammograms	FFDM Selenia Dimensions, Hologic	Quantra Version 2.0	κ 0.73 (0.67, 0.79)	Good agreement
Osteras 2016⁷³	537 mammograms	FFDM Selenia Dimensions, Hologic	BI-RADS 4^th edition five radiologists with 1 to 34 years’ experience	κ 0.73 (0.67, 0.79)	Good agreement
van der Waal 2015⁷¹	992 mammograms	FFDM Selenia system, Hologic	Quantra Version 1.3	Prediction of dense category AUC: 0.95 (0.94, 0.96)	Excellent agreement
van der Waal 2015⁷¹		FFDM Selenia system, Hologic	BI-RADS 5^th edition 3 experienced radiologists	Accuracy at 13.8% cut-off: sensitivity: 82%; specificity: 92%)
Youk 2021⁷⁷	4000 participants	FFDM Lorad Selenia, Hologic	Quantra Version 2.1.1	κ 0.54 (0.52, 0.56)	Moderate agreement
Youk 2021⁷⁷	4000 participants	FFDM Lorad Selenia, Hologic	BI-RADS 5^th edition Three radiologists with 7-, 10- and 14-years’ experience	κ 0.54 (0.52, 0.56)	Moderate agreement

Source for the definitions of agreement interpretations: Kappa statistic, Altman (1999);⁸⁸ AUC statistic, Hosmer et al. (2013).⁹⁰

AUC: area under the curve; FFDM: full field digital mammography; κ: Kappa statistic; NR: not reported; RANZCR: Royal Australian and New Zealand College of Radiology; SE: standard error; BI-RADS: breast imaging reporting and data system.

Discussion

This evidence synthesis includes 26 studies that evaluated the concordance between automated/semi-automated and manual assessment of breast density for 2D digital mammography published during the last decade. However, these findings must be considered against a back-drop of advances in artificial intelligence that are accelerating, leading to automated systems for measuring breast density that are evolving rapidly (see below).⁹¹ This means that some currently used and future software may supersede those examined in this review. Nevertheless, to the best of our knowledge, this review provides the most comprehensive and up-to-date synthesis of available evidence on agreement between automated and manual assessments of breast density in routine screening populations.

Overall, the included studies were of moderate quality, although many of the ReBIP checklist items were rated as unclear due to insufficient reporting in the full-text publications. Our findings show that, overall, there is good concordance between automated and manual assessment of breast density. Nevertheless, there is considerable variation both between automated technologies and within different versions of automated software. Robust conclusions are difficult to draw because of the small number of studies evaluating similar versions of automated software using comparable BI-RADS editions. The largest body of evidence for one type of automated software is from studies evaluating Volpara. This is unsurprising given the widespread use of Volpara in clinical practice. Our meta-analysis indicates that Volpara may be more likely to categorise mammograms as dense compared with manual assessment, but the difference was not statistically significant and the I² statistic indicated the presence of considerable heterogeneity between studies. It should be noted that Volpara measures the volume of dense tissue, while visual assessment estimates the visible area of dense tissue; therefore, visual assessment may not capture volume information in manual ratings.

Three studies examined the impact of radiologists’ experience levels on agreement.^21,70,72 Their findings indicate that the agreement between manual and automated density assessments is consistent regardless of whether the assessments were performed by senior/experienced radiologists or junior/general radiologists. This suggests that radiologists’ experience does not influence the level of agreement between manual and automated density assessments.

One study indicated that agreement between Volpara and manual density assessment is greater for mammograms that contain image distractors.⁸⁵ The authors of that study⁹² noted that this finding was unexpected and indicates the need for further research to explore the impact of the image quality of mammograms for automated and manual breast density assessment. It should be noted that the results of this study were derived from a small sample of 250 mammograms making it difficult to draw firm conclusions.

A 2024 multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, highlights that automation bias (the tendency of humans to favour AI-generated decisions over those made by humans) can lead to errors if the AI system is incorrect.⁹ This risk may increase when radiologists are fatigued or when there is limited capacity to supervise or validate the AI output. Automation bias has been evaluated in other areas of clinical decision support and highlights the risk of errors if clinicians are over-reliant or uncritical of automated decisions.^92–94 This emphasises the need for specialised training of those who use automated tools as part of their clinical decision-making. Similarly, in scenarios where autonomous AI systems continue to learn and adapt over time, such automated systems require ongoing monitoring to ensure their performance remains satisfactory. This might also apply to situations where new versions of automated software are released. The implementation of automated technology in radiological clinical practice would need to take into account any associated training, strategic, regulatory, performance, technical, IT infrastructure, or economic considerations, including the ability to implement at scale across different NHS trusts and regions.

Limitations

Overall, there is a paucity of evidence evaluating similar versions of automated/semi-automated software against manual breast density assessment using similar editions of BI-RADS, making it difficult to draw firm conclusions on the concordance of automated software with manual assessment, particularly for newer versions or less common software. While we tried to ensure that the included studies are representative of the UK general breast cancer screening population, confirming the eligibility of study populations was challenging because the term ‘screening’ is often used in different ways by study authors. For example, some use it to refer to imaging for breast cancer detection in the general screening population, while others use it when imaging is used for surveillance to detect recurrent or second primary breast cancer. Consequently, it is possible that some included studies may have study populations with fewer than our predefined 60% general screening participants eligibility criterion. It is also possible that some relevant studies have been excluded because we were unable to establish the screening characteristics of their populations.

The applicability of our findings to minority ethnic and other under-served groups is uncertain because of the poor reporting of participants’ ethnic and socioeconomic characteristics in the included studies. This lack of information could be problematic if the automated technologies were trained on datasets that excluded select groups. It is, therefore, unclear whether our findings are truly representative of a broad screening population, highlighting the need for more inclusive research and transparent reporting. Future research studies should provide clearer descriptions of the novel aspects of the considered technologies and characteristics of the participants, as well as consider the broader implications of the introduction of automated tools within the healthcare system.

Implications for policy and practice

Automated breast density measurement tools show promise for integration into breast screening programmes. However, concerns about the generalisability of the current evidence base, together with the practical challenges of implementing new systems in clinical practice, currently preclude firm policy recommendations. The finding that Volpara tended to classify more women as having dense breasts may also have important resource implications if adopted in practice. In the UK, the NSC has established an expert breast cancer working group to consider new and emerging evidence and developments that could improve breast screening programmes. The group has indicated support for future modelling work examining the clinical impact and costs of incorporating breast density into screening pathways, alongside new assessments for breast cancer risk, including AI-based methodologies. Therefore, breast density is being actively considered within ongoing UK NSC work, evaluating its clinical and cost implications as part of future risk-based screening approaches.

Conclusions

Automated breast density measurement tools such as Volpara and Quantra show good agreement with manual BI-RADS classification and hold promise for integration into population breast screening programmes. However, heterogeneity between software types and versions, limited comparability across studies, under-reporting of participant demographics, and the rapid development of AI-based technologies constrain the generalisability of these findings.

Supplemental Material

sj-docx-1-msc-10.1177_09691413261447057 - Supplemental material for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Supplemental material, sj-docx-1-msc-10.1177_09691413261447057 for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review by Clare Robertson, David Cooper, Sinéad N Duggan, Paul Manson, Mari Imamura, Rodolfo Hernández, Mike Clarke, Shaun Treweek and Miriam Brazzelli in Journal of Medical Screening

Supplemental Material

sj-docx-2-msc-10.1177_09691413261447057 - Supplemental material for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Supplemental material, sj-docx-2-msc-10.1177_09691413261447057 for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review by Clare Robertson, David Cooper, Sinéad N Duggan, Paul Manson, Mari Imamura, Rodolfo Hernández, Mike Clarke, Shaun Treweek and Miriam Brazzelli in Journal of Medical Screening

Supplemental Material

sj-docx-3-msc-10.1177_09691413261447057 - Supplemental material for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Supplemental material, sj-docx-3-msc-10.1177_09691413261447057 for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review by Clare Robertson, David Cooper, Sinéad N Duggan, Paul Manson, Mari Imamura, Rodolfo Hernández, Mike Clarke, Shaun Treweek and Miriam Brazzelli in Journal of Medical Screening

Supplemental Material

sj-docx-4-msc-10.1177_09691413261447057 - Supplemental material for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Supplemental material, sj-docx-4-msc-10.1177_09691413261447057 for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review by Clare Robertson, David Cooper, Sinéad N Duggan, Paul Manson, Mari Imamura, Rodolfo Hernández, Mike Clarke, Shaun Treweek and Miriam Brazzelli in Journal of Medical Screening

Supplemental Material

sj-docx-5-msc-10.1177_09691413261447057 - Supplemental material for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Supplemental material, sj-docx-5-msc-10.1177_09691413261447057 for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review by Clare Robertson, David Cooper, Sinéad N Duggan, Paul Manson, Mari Imamura, Rodolfo Hernández, Mike Clarke, Shaun Treweek and Miriam Brazzelli in Journal of Medical Screening

Supplemental Material

sj-docx-6-msc-10.1177_09691413261447057 - Supplemental material for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review

Supplemental material, sj-docx-6-msc-10.1177_09691413261447057 for Concordance between automated/semi-automated measurement and manual assessment of mammographic breast density in individuals undergoing breast cancer screening: A systematic review by Clare Robertson, David Cooper, Sinéad N Duggan, Paul Manson, Mari Imamura, Rodolfo Hernández, Mike Clarke, Shaun Treweek and Miriam Brazzelli in Journal of Medical Screening

Footnotes

Acknowledgements

We are grateful to the following members of the Advisory group: Lesley Anderson (Aberdeen Centre for Health Data Science, University of Aberdeen), Ruth Burns (Patient Partner), Debra Dulake (Patient Partner), John Marshall (National Screening Committee), Cristina Visintin (National Screening Committee), and Shantini Paranjothy (Public Health, NHS Grampian).

We are grateful to the authors of included studies who kindly responded to our enquiries regarding specific aspects of their work.

ORCID iDs

Clare Robertson

David Cooper

Sinéad N Duggan

Paul Manson

Mari Imamura

Rodolfo Hernández

Mike Clarke

Shaun Treweek

Miriam Brazzelli

Author contributions

MB, MC, SD, PM, CR, and ST contributed to the development of the protocol. CR conducted the review (screening of search results, data extraction and quality assessment). SND participated in the screening of search results. MI conducted the data extraction check. DC conducted the statistical analyses. CR drafted the initial version of this manuscript. PM conducted the literature search and contributed to writing the manuscript. All authors contributed to the interpretation of the data, writing the manuscript and had the opportunity to approve the final version manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This review was funded by the National Institute for Health and Care Research (NIHR) Evidence Synthesis Programme (Project number NIHR164221) and commissioned by the United Kingdom National Screening Committee (UK NSC). The UK NSC provides screening recommendations to the UK government, based on a range of criteria, including a consistent approach to the identification and synthesis of relevant literature. The UK NSC and NIHR have recently agreed on a formal collaboration to ensure the quality of evidence reports. As the commissioners of this work, a report of the findings was submitted to the UK NSC to inform their evidence base. The NIHR Evidence Synthesis Programme had no role in the collection, analysis, or interpretation of the data; in the writing of the report; or in the decision to submit the article for publication. The UK NSC, as commissioners of the work, participated in the Advisory Group and provided expert input on study design, data interpretation, and the current screening pathway. However, they had no role in writing this manuscript, formulating the conclusions, or deciding to submit it for a journal publication.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

All the data upon which the review is based are available in the supplementary files (Appendices 1-6).

Supplemental material

Supplemental material for this article is available online.

References

Cancer Research UK. Cancer statistics for the UK. Available from: https://www.cancerresearchuk.org/health-professional/cancer-statistics-for-the-uk (Accessed January 2024).

Lauby-Secretan

Loomis

Straif

. Breast cancer screening: viewpoint of the IARC Working Group. N Engl J Med 2015; 373: 1479.

Njor

Nyström

Moss

, et al. Breast cancer mortality in mammographic screening in Europe: a review of incidence-based mortality studies. J Med Screen 2012; 19: 33–41.

Independent UK Panel on Breast Cancer Screening. The benefits and harms of breast cancer screening: an independent review. Lancet 2012; 380: 1778–1786.

Edmonds

O'Brien

Conant

. Mammographic breast density: current assessment methods, clinical implications, and future directions. Semin Ultrasound CT MR 2023; 44: 35–45.

American College of Radiology. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System 2013. 2013. Available from: https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems/Bi-Rads (Accessed January 2024).

Sickles

D’Orsi

Bassett

, et al. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. 5th ed. Reston, VA: American College of Radiology, 2013.

Radhakrishna

Agarwal

Parikh

, et al. Role of magnetic resonance imaging in breast cancer management. South Asian J Cancer 2018; 7: 69–71.

Brady

Allen

Chong

, et al. Developing, purchasing, implementing and monitoring AI tools in radiology: practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. Insights Imaging 2024; 15: 16.

10.

Patterson

Stinton

Alkhudairy

, et al. Additional screening with ultrasound after negative mammography screening in women with dense breasts: a systematic review. London: National Screening Committee, 2019.

11.

Higgins

Thomas

Chandler

, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023) 2024 December 2024]. Available from: www.training.cochrane.org/handbook.).

12.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Br Med J 2021; 372: n71.

13.

American College of Radiology. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. 3rd ed. Reston, VA: American College of Radiology, 1998.

14.

D'Orsi

Bassett

Berg

. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. 4th ed. Reston, VA: American College of Radiology, 2003.

15.

Trial Forge. PRO EDI participant characteristics table 22/3/2024. 2024. Available from: https://www.trialforge.org/trial-diversity/pro-edi-improving-how-equity-diversity-and-inclusion-is-handled-in-evidence-synthesis/ (Accessed October 2024).

16.

Centre for Reviews and Dissemination. Systematic reviews: CRD's guidance for undertaking systematic reviews in health care. York, England: University of York, 2009, Available from: https://www.york.ac.uk/media/crd/Systematic_Reviews.pdf. (Accessed March 2024).

17.

Verhagen

de Vet

de Bie

, et al. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol 1998; 51: 1235–1241.

18.

Downs

Black

. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health 1998; 52: 377–384.

19.

Jackson

Ameratunga

Broad

, et al. The GATE frame: critical appraisal with pictures. Evid Based Med 2006; 11: 35–38.

20.

Gemici

Aribal

Ozaydin

, et al. Comparison of qualitative and volumetric assessments of breast density and analyses of breast compression parameters and breast volume of women in Bahcesehir mammography screening project. Meme Sagligi Dergisi/J Breast Health 2020; 16: 110–116.

21.

Eom

Cha

Kang

, et al. Comparison of variability in breast density assessment by BI-RADS category according to the level of experience. Acta Radiol 2018; 59: 527–532.

22.

Singh

Sharma

Singla

, et al. Breast density estimation with fully automated volumetric method: comparison to radiologists assessment by BI-RADS categories. Acad Radiol 2016; 23: 78–83.

23.

Rigaud

Weaver

Dennison

, et al. Deep learning models for automated assessment of breast density using multiple mammographic image types. Cancers (Basel) 2022; 14: 5003.

24.

StataCorp. Stata Statistical Software: Release 19. College Station, TX: StataCorp LLC, 2025.

25.

Balleyguier

Arfi-Rouche

Boyer

, et al. A new automated method to evaluate 2D mammographic breast density according to BI-RADS Atlas Fifth Edition recommendations. Eur Radiol 2019; 29: 3830–3838.

26.

Carneiro

Franco

MLN

De Lima Thomaz

, et al. Breast density pattern characterization by histogram features and texture descriptors. Revista Brasileira de Engenharia Biomedica 2017; 33: 69–77.

27.

Couwenberg

Verkooijen

, et al. Assessment of a fully automated, high-throughput mammographic density measurement tool for use with processed digital mammograms. Cancer Causes Control 2014; 25: 1037–1043.

28.

Lehman

Yala

Schuster

, et al. Mammographic breast density assessment using deep learning: clinical implementation. Radiology 2019; 290: 52–58.

29.

Lin

, et al. Automatic mammographic breast density classification in Chinese women: clinical validation of a deep learning model. Acta Radiol 2023; 64: 1823–1830.

30.

Matthews

Singh

Mombourquette

, et al. A multisite study of a breast density deep learning model for full-field digital mammography and synthetic mammography. Radiol Artif Intell 2021; 3: e200015.

31.

Wei

Chan

, et al. Computer-aided assessment of breast density: comparison of supervised deep learning and feature-based statistical learning. Phys Med Biol 2018; 63: 025005.

32.

Wei

Chan

Helvie

, et al. Radiomic modeling of BI-RADS density categories. Progress in Biomedical Optics and Imaging - Proceedings of SPIE. 2017;10134:101340P.

33.

Kai

Ishizuka

Otsuka

, et al. Automated estimation of mammary gland content ratio using regression deep convolutional neural network and the effectiveness in clinical practice as explainable artificial intelligence. Cancers (Basel) 2023; 15: 2794.

34.

Liu

, et al. Multi-view mammographic density classification by dilated and attention-guided residual learning. IEEE/ACM Trans Comput Biol Bioinform 2021; 18: 1003–1013.

35.

Khan

Wang

Chan

, et al. Automatic BI-RADS classification of mammograms. Lect Notes Comput Sci 2016; 9431: 475–487.

36.

Watanabe

Retson

Wang

, et al. Mammographic breast density model using semi-supervised learning reduces inter-/intra-reader variability. Diagnostics 2023; 13: 2694.

37.

Yamamuro

Asai

Hashimoto

, et al. Utility of U-Net for the objective segmentation of the fibroglandular tissue region on clinical digital mammograms. Biomed Phys Eng Express 2022; 8: 045016.

38.

Youk

Gweon

Son

, et al. Automated volumetric breast density measurements in the era of the BI-RADS fifth edition: a comparison with visual assessment. AJR Am J Roentgenol 2016; 206: 1056–1062.

39.

Singh

Joshi

Singh

, et al. Volumetric breast density evaluation using fully automated Volpara software, its comparison with BIRADS density types and correlation with the risk of malignancy. Egyptian J Radiol Nuclear Med 2022; 53: 118.

40.

Vállez

Bueno

Déniz

, et al. Breast density classification to reduce false positives in CADe systems. Comput Methods Programs Biomed 2014; 113: 569–584.

41.

Shao

Liu

, et al. The correlation between mammographic densities and molecular pathology in breast cancer. Cancer Biomark 2018; 22: 523–531.

42.

Nykanen

Okuma

Sutela

, et al. The mammographic breast density distribution of Finnish women with breast cancer and comparison of breast density reporting using the 4th and 5th editions of the Breast Imaging-Reporting and Data System. Eur J Radiol 2021; 137: 109585.

43.

Dontchos

Yala

Barzilay

, et al. External validation of a deep learning model for predicting mammographic breast density in routine clinical practice. Acad Radiol 2021; 28: 475–480.

44.

Oshima

Shinohara

Kamiya

. Investigation of the effect of image resolution on automatic classification of mammary gland density in mammography images using deep learning. Proc SPIE-Int Soc Opt Eng 2019; 11050: 1105018.

45.

Fisher

Wei

, et al. Multi-path deep learning model for automated mammographic density categorization. Prog Biomed Opt Imaging - Proc of SPIE 2019; 10950: 109502E.

46.

Chen

Ruth

Zhang

, et al. Breast density assessment: image feature extraction and density classification with machine intelligence. Prog Biomed Opt Imaging - Proc SPIE 2018; 10573: 105735F.

47.

Kim

, et al. Mammographic density estimation with automated volumetric breast density measurement. Korean J Radiol 2014; 15: 313–321.

48.

Testagrose

Gupta

Erdal

, et al. Impact of concatenation of digital craniocaudal mammography images on a deep-learning breast-density classifier using Inception-V3 and ViT. IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022. Las Vegas, NV: Institute of Electrical and Electronics Engineers (IEEE), 2022.

49.

Tsuchida

Negishi

Takahashi

, et al. Dense-breast classification using image similarity. Radiol Phys Technol 2020; 13: 177–186.

50.

Ekpo

Hogg

Wasike

, et al. A self-directed learning intervention for radiographers rating mammographic breast density. Radiography 2017; 23: 337–342.

51.

Kaiser

Fieselmann

Vesal

, et al. Mammographic breast density classification using a deep neural network: assessment based on inter-observer variability. Prog Biomed Opt Imaging - Proc SPIE 2019; 10952: 109520O.

52.

Osteras

Martinsen

ACT

Brandal

SHB

, et al. BI-RADS density classification from areometric and volumetric automatic breast density measurements. Acad Radiol 2016; 23: 468–478.

53.

Pawar

Sharma

Sapate

, et al. Multichannel DenseNet architecture for classification of mammographic breast density for breast cancer detection. Front Public Health 2022; 10: 885212.

54.

Chugh

Goyal

Pandey

, et al. Morphological and Otsu's technique based mammography mass detection and deep neural network classifier based prediction. Trait Signal 2022; 39: 1283–1294.

55.

Bandeira Diniz

Azevedo Valente

, et al. Detection of mass regions in mammograms by bilateral analysis adapted to breast density using similarity indexes and convolutional neural networks. Comput Methods Programs Biomed 2018; 156: 191–207.

56.

Gamdonkar

Tay

Ryder

, et al. IDensity: an automatic Gabor filter-based algorithm for breast density assessment. Prog Biomed Opt Imaging - Proc SPIE 2015; 9416: 941607.

57.

Kaliyaperumal

Selvarajan

. Automated characterization of mammographic density for early detection of breast cancer risk. Int J Simul Syst Sci Technol 2014; 15: 56–63.

58.

Lin

Wei

, et al. Deep-learning-based semantic labeling for 2D mammography and comparison of complexity for machine learning tasks. J Digit Imaging 2019; 32: 565–570.

59.

Tiryaki

Kaplanoğlu

. Deep learning-based multi-label tissue segmentation and density assessment from mammograms. IRBM 2022; 43: 538–548.

60.

Trivizakis

Ioannidis

Melissianos

, et al. A novel deep learning architecture outperforming ‘off–the–shelf’ transfer learning and feature–based methods in the automated assessment of mammographic breast density. Oncol Rep 2019; 42: 2009–2015.

61.

Simon

Lavanya

Vijayan

. PSO based density classifier for mammograms. IFMBE Proc 2017; 61: 62–66.

62.

Sharma

Singh

. CFS-SMO based classification of breast density using multiple texture models. Med Biol Eng Comput 2014; 52: 521–529.

63.

Lee

Goh

YLE

Lai

. Classification of mammographic breast density and its correlation with BI-RADS in elder women using machine learning approach. J Med Imaging Radiat Sci 2022; 53: 28–34.

64.

Heath

Bowyer

Kopans

, et al. The Digital Database for Screening Mammography. In: Yaffe

(ed.) Proceedings of the Fifth International Workshop on Digital Mammography. Madison: WI: Medical Physics Publishing, 2001, pp. 212–218.

65.

Suckling

. The mammographic images analysis society digital mammogram database. Excerpta Med Int Congr Ser 1994; 1069: 375–378.

66.

Fieselmann

Fornvik

, et al. Volumetric breast density measurement for personalized screening: accuracy, reproducibility, consistency, and agreement with visual assessment. J Med Imaging (Bellingham) 2019; 6: 031406.

67.

Fornvik

Fieselmann

, et al. Comparison between software volumetric breast density estimates in breast tomosynthesis and digital mammography images in a large public screening cohort. Eur Radiol 2019; 29: 330–336.

68.

Sartor

Lang

Rosso

, et al. Measuring mammographic density: comparing a fully automated volumetric assessment versus European radiologists’ qualitative classification. Eur Radiol 2016; 26: 4354–4360.

69.

Timberg

Fieselmann

Dustler

, et al. Breast density assessment using breast tomosynthesis images. Lecture Notes in Computer Science; 2016.

70.

Holland

van Zelst

den Heeten

, et al. Consistency of breast density categories in serial screening mammograms: a comparison between automated and human assessment. Breast 2016; 29: 49–54.

71.

Van Der Waal

Heeten

Pijnappel

, , et al. Comparing visually assessed BI-RADS breast density and automated volumetric breast density software: a cross-sectional study in a breast cancer screening setting. PLoS One 2015; 10: e0136667.

72.

Le Boulc'h

Bekhouche

Kermarrec

, et al. Comparison of breast density assessment between human eye and automated software on digital and synthetic mammography: impact on breast cancer risk. Diagn Interv Imaging 2020; 101: 811–819.

73.

Osteras

Martinsen

ACT

Brandal

SHB

, et al. Classification of fatty and dense breast parenchyma: comparison of automatic volumetric density measurement and radiologists’ classification and their inter-observer variation. Acta Radiol 2016; 57: 1178–1185.

74.

Lee

Nishikawa

. Automated mammographic breast density estimation using a fully convolutional network. Med Phys 2018; 45: 1178–1190.

75.

Portnow

Georgian-Smith

Haider

, et al. Persistent inter-observer variability of breast density assessment using BI-RADS 5th edition guidelines. Clin Imaging 2022; 83: 21–27.

76.

Lee

Sohn

Han

. Comparison of mammographic density estimation by Volpara software with radiologists’ visual assessment: analysis of clinical-radiologic factors affecting discrepancy between them. Acta Radiol 2015; 56: 1061–1068.

77.

Youk

Gweon

Son

, et al. Fully automated measurements of volumetric breast density adapted for BIRADS 5th edition: a comparison with visual assessment. Acta Radiol 2021; 62: 1148–1154.

78.

Lee

Son

Kim

, et al. Mammographic density assessment by artificial intelligence-based computer-assisted diagnosis: a comparison with automated volumetric assessment. J Digit Imaging 2022; 35: 173–179.

79.

Angulo

Ferrer

Pinto

, et al. Experimental assessment of an automatic breast density classification algorithm based on principal component analysis applied to histogram data. Prog Biomed Opt Imaging - Proc SPIE 2015; 9287: 92870E.

80.

Fonseca

Mendoza

Wainer

, et al. Automatic breast density classification using a convolutional neural network architecture search procedure. Prog Biomed Opt Imaging - Proc SPIE 2015; 9414: 941428.

81.

Pesce

Tajerian

Chico

, et al. Interobserver and intraobserver variability in determining breast density according to the fifth edition of the BI-RADS atlas. Radiologia (Roma) 2020; 62: 481–486.

82.

Ekpo

McEntee

Rickard

, et al. Quantra™ should be considered a tool for two-grade scale mammographic breast density classification. Br J Radiol 2016; 89 (1060).

83.

Pavan

ALM

Vacavant

Trindade

, et al. Fibroglandular tissue quantification in mammography by optimized fuzzy C-means with variable compactness. IRBM 2017; 38: 228–233.

84.

Aloufi

AlNaeem

Almousa

, et al. Breast density distribution among the Saudi screening population and correlation between radiologist visual assessment and two automated methods. Prog Biomed Opt Imaging - Proc SPIE 2022; 12035: 120350L.

85.

Alomaim

O'Leary

Ryan

, et al. Subjective versus quantitative methods of assessing breast density. Diagnostics 2020; 10: 331.

86.

Zackrisson

Lang

Rosso

, et al. One-view breast tomosynthesis versus two-view mammography in the Malmo breast tomosynthesis screening trial (MBTST): a prospective, population-based, diagnostic accuracy study. Lancet Oncol 2018; 19: 1493–1503.

87.

Aloufi

Alnaeem

Almousa

, et al. Mammographic breast density and breast cancer risk in the Saudi population: a case-control study using visual and automated methods. Br J Radiol 2022; 95: A16.

88.

Altman

. Practical statistics for medical research. New York: Chapman & Hall/CRC Press, 1999.

89.

Cohen

. Statistical power analysis for the behavioral sciences. 2nd ed. Hillside, NJ: Lawrence Erlbaum Associates, 1988.

90.

Hosmer

Jr, Lemeshow

Sturdivant

. Assessing the fit of the model. In: Hosmer

Jr, Lemeshow

Sturdivant

(eds) Applied Logistic Regression. 3rd ed. Hoboken: NJ: John Wiley & Sons, Inc, 2013, pp.143–202.

91.

Luo

Wang

, et al. Knowledge and awareness of generative artificial intelligence use in medicine among international stakeholders: a cross-sectional study. J Evid Based Med 2025; 18: e70034.

92.

Rosbach

Ganz

Ammeling

, et al. Automation bias in AI-assisted medical decision-making under time pressure in computational pathology. arXiv. 2024:https://arxiv.org/abs/2411.00998v1.

93.

Lyell

Magrabi

Raban

, et al. Automation bias in electronic prescribing. BMC Med Inform Decis Mak 2017; 17: 28.

94.

Kücking

Hübner

Przysucha

, et al. Automation bias in AI-decision support: results from an empirical study. Stud Health Technol Inform 2024; 317: 298–304.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.03 MB

0.02 MB

0.00 MB

0.06 MB