Abstract
Purpose:
The purpose of this study was to identify personal, systemic, and specifically, geosocial risk factors of breast cancer screening nonadherence and to assess how machine learning techniques can improve cancer screening rates.
Materials and methods:
The study included 21 543 women aged 50 to 74 with a primary care provider at 15 family medicine clinics in southcentral Pennsylvania between January 1, 2021, and December 31, 2022. Demographics and healthcare utilization data were geocoded to Census blocks for neighborhood socioeconomic measures and analyzed using multiple logistic regression to assess association with adherence to breast cancer screening guidelines. Area deprivation index (ADI) was extracted using machine learning to integrate factors such as poverty, employment, education, and housing quality as a measure.
Results:
The study identified women in minority groups (aOR: 0.659, P < .01) have decreased odds of being screened for breast cancer, with the exception of Hispanic individuals (aOR: 1.456, P < .01). Uninsured individuals (aOR: 0.334, P < .01) and those on Medicaid (aOR: 0.793, P < .01) are at a greater risk of non-adherence to screening. Lastly, ADI and screening rates are inversely proportional.
Conclusions:
Social determinants of health influence a person’s likelihood of being screened for breast cancer. Identifying barriers to screening in these individuals, and ways in which they compound, is a crucial step in improving screening adherence.
Keywords
Introduction
Breast cancer is the second most common cancer among women in the United States and is the second most common cause of cancer death among women in the United States. In 2022, over 270 000 new cases of breast cancer were diagnosed, with over 42 000 deaths. 1 Screening for breast cancer is an evidence-based strategy for early detection of cancer that can effectively prevent disease progression and deaths and improve the quality of life. Yet, both the breast cancer screening rate (<72%) and mortality rate (21.4 persons per 100 000) of breast cancer in Pennsylvania are worse than the nation averages. Lower breast cancer screening rates also have been found to be disproportionately associated with socioeconomic and neighborhood-level characteristics. 2 The COVID-19 pandemic has further imposed additional challenges in breast cancer screening due to difficulty in accessing routine health screening services. Prolonged intervals between, or missing breast cancer screening has raised concerns about potentially serious health consequences and an increase in cancer disparities among women already experiencing health inequities.
CDC guidelines state women aged 40 to 74 at average risk should receive a mammogram every 2 years. The age was lowered from 50 to 40 in April of 2024, and therefore this study included individuals 50 to 74. 3 Social determinants of health such as distance to screening facilities, health insurance status and transportation influence screening access. A 2016 study revealed health insurance type, income, and primary care visits influence screening likelihood. 4 Education level is also important, with women of higher education showing a 36% higher adherence to guidelines. 5 Area Deprivation Index (ADI) is a multidimensional measure of socioeconomic condition of a certain neighborhood or geographic location. It can be used as a tool to summarize how of factors such as income, education level, and housing quality can have an impact on health outcomes.
While national data reveals 76% of women in the United States follow the CDC recommendation in getting annual mammograms, Pennsylvania reports lower averages in adherence to regular screening. 2 There is existing research hypothesizing barriers for women in being adherent to regular breast cancer screening, though there is a need for further exploration as to which factors have the greatest association with mammography screening. Defining the social determinants of health which influence a women’s likelihood of getting screened, as well as barriers which individuals may face, allows for a more informed process in crafting solutions to better screening rates. Furthermore, there is limited research evaluating how machine learning can be utilized in optimizing breast cancer screening adherence by studying metrics of social determinants of health. The goal of this study is to gather clinical and geosocial information to allow for the calculation of a neighborhood deprivation index in order to gauge its relation to breast cancer screening adherence. Promotion of cancer screening adherence through patient navigation is an evidence-based strategy which takes into account the multifaceted nature of social determinants of health. 6 There is also growing evidence that cancer interventions integrated with informatics-driven decision tools using electronic health records (EHRs) and predictive modeling can further improve breast cancer screening rates. 7 This study intended to design precision navigation interventions in community-based primary care settings grounded in the neighborhood context. The study looked at 21 543 women aged 50 to 74 in southcentral Pennsylvania between January 1, 2021, and December 31, 2022 and assessed for risk factors for breast cancer screening adherence.
Methods
Study Design and Setting
The study utilized a retrospective longitudinal design consisting of visits from women ages 50 to 74 years who were actively managed by a primary care provider (PCP) between January 1, 2021 and December 31, 2022 at 15 family medicine clinics of a regional academic medical center in southcentral Pennsylvania. Patient demographic and clinical characteristics were extracted from the health system’s electronic health record (EHR) database. Then, these data were geocoded and linked to corresponding Census data as a measure of neighborhood-level socioeconomic status. The breast cancer screening status was categorized as adherent or non-adherent to recommended breast cancer screening according to the US Preventive Services Task Force’s (USPSTF) guidelines for breast cancer screening. According to the USPSTF guidelines recommend women who are 40 to 74 years old and are at average risk for breast cancer get a mammogram every 2 years. Individuals who were deceased or residing outside of Pennsylvania were excluded from the analysis. This research received approval from the Institutional Review Board of the academic medical center.
Demographic and Comorbid Characteristics
Patient characteristics included age, sex, race/ethnicity, English speaking, health insurance type, and comorbid conditions. The age variable was calculated based on each patient’s age at the end of the study period which was in 2023. Only encounters in which the patient’s age at the time of the encounter fell within the aforementioned USPSTF guideline criteria were included in analyses. The race/ethnicity consisted of Hispanic, non-Hispanic White, non-Hispanic Black, Asian, and Other race category. The health insurance types included commercial, Medicare, Medicaid, and uninsured. The complexity of the patient’s health conditions was measured by 29 medical, psychiatric, and lifestyle-related indicators presented in the Elixhauser comorbidity measure,8,9 which has been considered a reliable proxy of complex comorbidity indices. 10 Breast cancer risk was calculated using NIH Gail Model.
Healthcare Utilization
A number of healthcare utilization characteristics were included in the analysis, including prior appointment history, mode of visit (in-person, telemedicine), clinician type (physician, physician assistant/nurse practitioner), clinician’s years of practice, the primary care provider (PCP) status, and continuity of care indices. The prior appointment history was measured by the prior completed visit, no-show, and late cancellation rates, representing the ratio of the number of specific outcome events to the number of all prior primary care appointments within 3 years.
Geosocial Contexts
Patients’ home addresses were geocoded and linked to their corresponding census block group. The rural status of each patient was determined by linking the geocoded address to the Census Urban and Rural Classification mapping file. 11 To assess neighborhood socioeconomic status, the Area Deprivation Index (ADI) was extracted at the census block group level, which incorporates factors such as poverty, education, employment, and housing quality. 12 Additionally, 2 Census statistics, the percentage of individuals without access to vehicles and the percentage of high school graduates, were obtained at the block group level as proxies for education and transportation availability.
Statistical analysis
Descriptive statistics were computed to describe sample characteristics by outcome measure. Differences in continuous variables were compared using the t-test. Proportion differences in categorical variables were evaluated using the Chi-square test. Multiple logistic regression modeling was used to assess the risk of adherence to breast cancer screening (binary dependent variable, Yes/No), controlled for demographic characteristics and comorbid conditions. The regression analysis was computed using the Maximum Likelihood Estimation method, which provided regression coefficients, standard errors (SEs), Wald 95% confidence intervals (CIs) for the coefficients, and P-values for each of the model variables. The adjusted odds ratio (aOR) and 95% CI of each variable was also calculated to predict the risk of the outcome measure. The significance level was determined based on two-tailed P-value <.05. All statistical analyses were performed using PROC LOGISTIC procedure (Version 9.4 SAS Institute Inc., Carey, NC).
Results
Study Population
The analysis consisted of 21 543 women eligible for the breast screening criteria during the study period, including 7299 (33.9%) who did not have breast cancer screening. A greater percentage of individuals meeting the screening criteria were older, English speakers, and non-Hispanic White, compared to those not screened (Table 1). A higher percentage of screened individuals had commercial health insurance coverage, lived in less socioeconomically deprived areas, attained higher levels of education, and had access to more transportation resources. While the overall comorbidity burden did not differ between groups, a substantially greater portion of the screened group had undergone prior breast cancer screening, had a higher rate of completed visits, and a lower rate of missed appointments.
Descriptive Characteristics of Women Aged 50 to 74 Screened for Breast Cancer in Pennsylvania, 2021 to 2022 (N = 21 543).
Likelihood of Adherence to Breast Cancer Screening
The logistic regression results identify several significant factors associated with the likelihood of receiving breast cancer screening. Prior breast cancer screening had the strongest association, with individuals being 6 times more likely to be screened again (aOR: 6.007, P < .01). Higher breast cancer risk was also strongly associated with screening (aOR: 3.809, P < .01). More completed visits (aOR: 1.06, P < .01) and fewer missed appointments (aOR: 0.888, P < .01) in the past were linked with increased screening rates Figure 1. Living farther from imaging centers (aOR: 0.989, P < .01) and residing in areas with higher ADI (aOR= 0.995, P < .01) were both negatively associated with screening.

Demographic effect on non-adherence to breast cancer screening.
Compared to those with commercial health insurance, individuals who were uninsured (aOR: 0.334, P < .01) or on Medicaid (aOR: 0.793, P < .01) had significantly lower odds of receiving screening. Compared to non-Hispanic Whites, Hispanic individuals had higher odds of screening (aOR: 1.456, P < .01), while individuals in the “Other” race group had lower odds (aOR: 0.659, P < .01). Age, rurality, education level, and English-speaking status were not significantly associated with screening after adjusting for other factors (Figure 2).

Health system effect on non-adherence to breast cancer screening.
Discussion
This study examined factors influencing breast cancer screening adherence by leveraging EHRs and predictive modeling, with a focus on incorporating demographic, clinical, and social determinants of health metrics to optimize screening uptake. Women whose primary language that is not English. were identified as having lower odds of undergoing breast cancer screening in our study. A previous study found Spanish-only speakers had 33% lower odds of screening compared with English speakers after propensity score matching. 13 Another study involving Asian American adults found that having a language preference other than English combined with a lack of language-concordant providers was associated with significantly reduced mammography completion. 14 These findings suggest that language barriers can potentially decrease awareness of screening, complicate appointment scheduling, and hinder follow-up care unless mitigated by interpreter services and/or culturally tailored outreach (Figure 3).

Geosocial effect on non-adherence to breast cancer screening.
It was also found that Hispanic patients were more likely to have breast cancer screening. These findings contrast national analyses which show lower or variable mammography rates among Hispanic women when compared to non-Hispanic White women. 15 However, recent analyses suggest that these disparities have narrowed in certain regions, likely due to targeted outreach and culturally informed interventions. 16
Health insurance type was found to be one of the most consistent predictors of preventive service utilization. National Health Interview Survey data show that uninsured women have significantly lower likelihood of having breast cancer screening than their privately insured counterparts, with Medicaid and Medicare coverage. 16 The Oregon Medicaid Experiment revealed a 60% increase in the likelihood of mammography after women gained Medicaid coverage. 17 These disparities likely reflect cost barriers, limited provider networks, and logistical challenges, which can be mitigated through policy interventions such as cost-sharing elimination and expansion of coverage.
Women living in socioeconomically disadvantaged neighborhoods were less likely to adhere to breast cancer screening recommendations. A multi-ethnic cohort study found renting a home, food insecurity, and overcrowding were significantly associated with decreased mammography adherence. 18 Greater travel distance to imaging facilities has also been shown to reduce participation in screening programs, and while our findings revealed no rural–urban gap, prior research indicates that residence alone is less predictive than the combination of geographic, socioeconomic, and access-related factors.
Missed screening mammography appointments (no-shows) are strongly linked with persistent lapses in care; about 12% of scheduled patients in one community health center failed to show, and 40% of those individuals remained unscreened a year later. 19 Additional research shows that mammography has one of the highest no-show rates among radiology services, with longer scheduling lead times significantly increasing the likelihood of missing appointments. 20 These patterns suggest that prior no-shows serve as an important predictor of future non-adherence.
The findings of this study illuminate potential avenues for interventions which can overall improve breast cancer screening rates. For instance, to address the gap in screening for those uninsured or on Medicaid, perhaps financial assistance services or efforts to reduce out-of-pocket costs for mammography could make a meaningful difference. For those impacted by lack of access to a vehicle or those residing in rural areas, increasing the presence of mobile mammography services may improve access. Since language barriers also appear to play a role in screening disparities, improving both the availability and quality of translation services in clinics could be an effective strategy. Lastly, with the knowledge that those previously screened are more likely to continue to be screened, attention can be focused on encouraging first-time screening. This could involve having designated time slots for patients who have never had mammography screening or sending more targeted appointment reminders prompting the scheduling of mammography services.
Study Limitations
Despite the study including a large sample size, it was conducted within a single academic healthcare center, which limits the generalizability of the findings. Also, social determinants of health measured at the personal level were unavailable because they were not captured in the EHR. Instead, neighborhood-level census data, derived from patients’ residential addresses, were used as proxies for socioeconomic context. These aggregated statistics describe the general characteristics of a specific geographic area and provide reliable, representative information about the neighborhood’s population. However, this measure may not fully reflect differences between individuals living in the same area, which could reduce the predictive accuracy of the models. 21 As the study used the patient as the unit of analysis, the predictive models did not include encounter-specific variables, such as appointment lead time, weather conditions, or other broader contextual factors, which may also impact breast cancer screening rates. Despite these limitations, the study provides important insights into factors affecting breast cancer screening adherence and highlights opportunities for enhancing predictive modeling using more granular and context-specific data in future research.
Conclusion
In this study, our aim was to identify geosocial factors and their impact on breast cancer screening nonadherence and the implications of machine learning techniques in improving these rates. It is evident social determinants of health play a significant role in influencing an individual’s ability to access mammography services. Not only are determinants such as English proficiency, distance to a screening facility, or level of education barriers on their own but they are additive. To work toward improving breast cancer screening rates, future research should aim to develop personalized, targeted interventions that address specific needs of at-risk populations identified through data driven analyses. In having a more in depth understanding of these factors, there can be more specific efforts in creating solutions to the issue of breast cancer screening gaps.
Footnotes
Acknowledgements
None.
Ethical Considerations
This research received approval from the Institutional Review Board of the academic medical center.
Consent to Participate
This was a retrospective study using de-identified electronic health records. Informed consent was not requested for the study.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability
Not available to the public.
