Abstract
Introduction
Prostate cancer (CaP) is the second most common cancer in men globally and a leading cause of cancer-related mortality, particularly among older men. In the United States, disparities in incidence, stage at diagnosis, and outcomes persist across racial, socioeconomic, and geographic lines. Men in medically underserved regions like Appalachia experience higher mortality due to limited access to timely screening and treatment. Given the complex interplay of clinical, behavioral, and sociodemographic factors, machine learning (ML) offers promise in identifying survival predictors that traditional models may overlook. This study applies ML models to assess the impact of age at diagnosis, treatment modality, and other sociodemographic and clinical factors on CaP survival outcomes using data from the Kentucky Cancer Registry (KCR).
Methods
We retrospectively analyzed 37 893 CaP cases diagnosed from 2010 to 2022 using KCR data linked to mortality records. Kaplan-Meier (KM), Random Survival Forest (RSF), and Elastic Net regression were used to estimate survival, assess variable importance, and evaluate predictive performance. Missing data were handled via multiple imputation, and leave-one-out cross-validation minimized overfitting. Models were compared using out-of-bag error, continuous ranked probability scores (CRPS), and interpretability.
Results
ML models identified age at diagnosis, treatment modality, and smoking status as the top survival predictors. RSF showed that age was approximately 2.5 times more influential than treatment type. Patients diagnosed before age 60 were more likely to undergo surgery and had lower mortality, while older men more often received non-surgical therapies and experienced worse outcomes. Insurance status, tumor grade, lymph node involvement, and marital status also affected survival, with pronounced disparities among uninsured and government-insured patients.
Conclusions
ML models highlighted age, smoking, and treatment type as key predictors of CaP survival. Findings support early screening, equitable treatment access, and behavioral health integration to reduce disparities in high-risk areas such as Kentucky and Appalachia.
Keywords
Introduction
Prostate cancer (CaP) is both one of the most prevalent cancers in men, with an estimated 1.5 million new cases diagnosed annually worldwide, and a leading cause of cancer-related deaths, with approximately 397 000 reported globally in 2022.1-3 Specifically, in the United States (U.S.), 313 780 new cases and 35 770 deaths were predicted in 2025, representing a 32.5% increase in new cases and a 2.5% rise in deaths compared to the 2021 data.4-6
The U.S. incidence rate for CaP was 116.5 per 100 000 men per year between 2017 and 2021, while the death rate was 19.0 per 100 000 men per year from 2018 to 2022.5-7 CaP represents a critical public health concern, particularly for older men, African American men, and those living in medically underserved areas.7,8 CaP incidence is nearly 70% higher in Black men than in White men, and Black men are more than twice as likely to die from CaP compared to White men.8-10 Advances in early detection, particularly through prostate-specific antigen (PSA) testing, along with improved treatments, have significantly increased survival rates. The 5-year relative survival rate for localized CaP is nearly 100%, although this drops to 31% for distant-stage disease.11-13 Despite these overall positive outcomes, substantial disparities persist across age, race, and geographic locations.
One of the most affected regions is Kentucky, particularly in the Appalachian region, which bears a disproportionately high CaP burden.14,15 Kentucky consistently ranks among the highest in the nation for both cancer incidence and mortality rates. According to the Centers for Disease Control and Prevention (CDC), Kentucky’s age-adjusted cancer mortality rate was 177.3 per 100 000 in 2020, placing it among the top states nationally. Within Appalachian Kentucky, CaP outcomes are notably worse. A study analyzing data from 2004 to 2019 found that men from Appalachian regions had poorer survival outcomes compared to non-Appalachian residents. 16 Factors contributing to this disparity included higher Gleason scores, elevated PSA levels, more aggressive tumors, and increased distant-stage diagnoses. 16 These clinical disparities are compounded by structural and social determinants such as rural provider shortages, limited healthcare access, high poverty rates, and lower screening adherence.14-16
Age is a significant risk factor for CaP.17,18 The likelihood of diagnosis increases to 1 in 52 men aged 50 to 59, and over 60% of all cases occur in men more than 65 years of age. 5 Older adults are more likely to be diagnosed with advanced disease and face complex treatment decisions further complicated by comorbidities and potential treatment side effects. 19 In Kentucky and neighboring Appalachian states such as West Virginia, Tennessee, and Alabama, where CaP significantly contributes to overall cancer burden,6,7,14,15,20 age-related disparities in diagnosis and treatment are more pronounced. These disparities are amplified by the region’s unique challenges, including limited healthcare infrastructure, socioeconomic disadvantages, and low screening rates.14,21,22 Moreover, older adults in these settings face greater barriers to timely diagnosis and care. Chronic conditions such as cardiovascular disease, diabetes, and obesity, which are highly prevalent in Kentucky, 22 can further complicate treatment and limit options for older CaP patients. Even when diagnosed, African American men in Kentucky experience poorer CaP outcomes, with survival disparities persisting after controlling for insurance status, treatment received, stage at diagnosis, and PSA levels. 14
Treatment for CaP ranges from active surveillance to surgery, radiation, chemotherapy, and hormonal therapy, depending on disease stage and patient characteristics.23-26 While survival has improved with earlier detection,27-30 limited access to timely diagnosis and treatment in Kentucky remains a major barrier. Although early detection through PSA testing can reduce mortality,28,31,32 yet many patients are diagnosed at later stages due to delays in care.14,33 These delays affect older men, who often require more aggressive treatments, compounding risks and worsening CaP prognosis.
CaP is a biologically complex disease shaped by genetic, environmental, and hormonal factors. Mutations in genes such as BRCA1/2 and DNA repair pathways increase susceptibility to aggressive forms. 34 Diagnostic approaches such as digital rectal exams (DRE), PSA tests, multiparametric magnetic resonance imaging (mpMRI), and biopsy aid early detection, 35 though PSA testing has limited specificity and risks overdiagnosis, highlighting the need for biomarker-driven strategies. While treatment of localized disease encompasses active surveillance, prostatectomy, or radiotherapy, in advanced cases, androgen deprivation therapy (ADT), sometimes combined with chemotherapy or second-line hormonal agents, is standard. These therapies can significantly impact quality of life (QoL) due to side effects including fatigue, sexual dysfunction, and hematologic toxicity.36,37 Age plays a major role in treatment decisions, with older adults often receiving less aggressive care due to comorbidities, frailty, patient preferences, or provider bias.38-41 In Kentucky, where chronic conditions are prevalent, treatment complexities and complication risks are especially high. 22 In many cases, older adults opt for or are steered toward less intensive treatments, leading to undertreatment that may lower both survival and QoL.42-46 Addressing these age-related disparities and optimizing treatment planning in aging and comorbidities is critical for delivering equitable CaP care.
Although many studies have explored factors influencing CaP survival,39,47,48 few have focused on age-based treatment variations and their effects on survival, especially in older populations. Despite documentation of age-related treatment disparities,38,39,48-53 limited research has examined how these disparities manifest in high-burden regions such as Kentucky. To address this gap, the Kentucky Cancer Registry (KCR), 54 a population-based central cancer registry, provides comprehensive data that supports detailed analysis of treatment patterns and survival outcomes. Utilizing KCR data allows for an in-depth examination of these factors across different age groups within this specific geographic context. This study aimed to investigate the relationship between age at diagnosis, treatment modalities, and survival outcomes among men diagnosed with CaP in Kentucky using ML models. Specifically, it sought to examine how age influences treatment selection and how those decisions affect survival, using KCR data to uncover age-related patterns and disparities. Our study findings can inform more personalized and equitable CaP care in high-risk populations.
Methods
Data Source and Cohort Selection
The KCR, part of the Surveillance, Epidemiology, and End Results (SEER) Program and the National Program of Cancer Registries (NCR), provides comprehensive population-based data on cancer incidence, survival, and treatment modalities. 54 The KCR database includes detailed, patient-level information, along with demographic factors for individuals treated in clinics and hospitals across Kentucky. Mortality data from the Kentucky State mortality records and the National Death Index are routinely linked to ensure up-to-date survival outcomes. Following approval from the Institutional Review Board (IRB #63067) and completion of the data use agreement, we extracted de-identified level II CaP data from the KCR for the most recently available period (January 1, 2010, to December 31st, 2022) for this retrospective study. The cohort included all non-institutionalized adult patients (aged ≥18 years) diagnosed with CaP during the study period. Patients with missing diagnosis, treatment, or staging data were excluded to maintain data quality and completeness. Thus, the study included all eligible CaP patients from the KCR database who met the predefined inclusion and exclusion criteria, resulting in a final sample size of 37 893 patients. This sample size was determined by the availability of cases within the database during the study period and is sufficient to ensure statistical power and reliable subgroup analyses.
Variable Classification
We categorized the study variables related to sociodemographic and clinical characteristics into five major domains: demographic, clinical, behavioral, treatment-related, and health system-level variables. Each domain reflects a distinct set of influences on CaP outcomes and is described in detail below.
Demographic variables
Demographic factors included age at diagnosis (continuous and categorized as <60 years [early] and ≥60 years [late]), self-reported race (White, Black, or Other), and ethnicity (Not Hispanic/Latino vs Other). Marital status was grouped into five categories: Married, Single/Never Married, Divorced/Separated, Widowed, and Living with Partner or Unknown/Not reported. Geographic location at diagnosis was dichotomized as Appalachian vs Non-Appalachian.
Clinical variables
Clinical characteristics encompassed tumor-specific data such as tumor grade (well, moderately, poorly differentiated, undifferentiated, or unknown), tumor histology (classified as adenocarcinoma, NOS [ICD-O-3 code 8140], or Other/Unspecified), and tumor size (measured in millimeters). Staging was assessed using the SEER Summary Stage 2000 classification: in situ/non-invasive/localized, regional by direct extension, regional lymph nodes, direct extension, and distant metastasis, unknown/unstageable and regional spread undetermined. Sentinel lymph node status (positive, negative, no sentinel nodes were biopsied or unknown) and CS lymph node involvement (continuous count) were also included to evaluate disease spread.
Behavioral variables
Behavioral risk factors consisted of smoking status (categorized as smoker, non-smoker, or unknown/unspecified) and cigarette pack-years (a continuous variable capturing cumulative exposure to tobacco).
Treatment variables
For this analysis, treatment modalities were classified based on the initial course of therapy as reported in the KCR. Categories included: (1) no/unknown/refused therapy; (2) surgery at primary site only; (3) surgery at primary site plus other therapies; (4) chemotherapy-only or radiation-only therapies; (5) other therapy only; and (6) combination therapies without surgery.
The five categories used herein compress the fifteen categories used in the KCR, i.e., (1) surgery (radical prostatectomy, partial prostatectomy, or transurethral resection of the prostate (TURP)); (2) radiation therapy (external beam radiation, brachytherapy, or combination); (3) hormonal therapy (androgen deprivation therapy, e.g., LHRH agonists, anti-androgens); (4) Chemotherapy; (5) immunotherapy; (6) cryotherapy (cryosurgery); (7) biologic therapy (including vaccines like sipuleucel-T); (8) clinical trial participation (treatment in a clinical trial involving experimental therapies); (9) treatment other than those listed (any other unspecified treatments); (10) combination of treatments (e.g., surgery and radiation); (11) surgery and hormonal therapy; (12) surgery and chemotherapy; (13) surgery and radiation therapy; (14) radiation and hormonal therapy; and (15) chemotherapy and hormonal therapy. Our category five (other therapy only) corresponds with the KCR category nine (treatment other than those listed).
Health system variables
Insurance type was used as a proxy for healthcare access and was classified into four groups: (1) Private insurance (including employer-sponsored plans); (2) Government-related programs (e.g., Medicare, Medicaid, TRICARE, Veterans Affairs); (3) Uninsured/Self-pay; and (4) Other or Unknown payer types.
Statistical Analysis
We summarized the CaP data using descriptive statistics and visualizations, reporting counts, means, standard deviations, median, minimum, and maximum for continuous risk factors, and counts and percentages for categorical risk factors. We employed a variety of statistical methods to compare risk factors across subgroups. For continuous variables, we used t-tests, in addition to nonparametric Wilcoxon tests to validate the results. Categorical risk factors were compared using the chi-square test. To assess differences in survival distributions among subgroups, we applied nonparametric methods such as the Kaplan-Meier survival curves. 55
The Distribution of the Risk Factors for Prostate Cancer
Distribution of Prostate Cancer Characteristics by Categorical Variables
Following imputation, we applied two advanced modeling techniques: Random Survival Forest (RSF) 56 and Elastic Net regularization procedure. 57 RSF is a nonparametric ensemble learning method well-suited for survival data, capable of capturing complex nonlinear relationships and interactions between variables. Elastic Net, a regularized regression method that combines both Lasso and Ridge penalties, was used to handle multicollinearity and perform variable selection, particularly effective in high-dimensional settings for feature selection. Leave-one-out cross-validation was used to select the penalty.
The fitted RSF model was evaluated using the out-of-bag (OOB) estimation procedure. This procedure provides a robust method for assessing model performance without requiring a separate validation set. Thus, for each tree in the forest, approximately one-third of the data is left out during bootstrapping; these left-out samples are then used to compute prediction errors, offering an unbiased estimate of model accuracy. The performance metric for our model is the Continuous Ranked Probability Score (CRPS), which evaluates the accuracy of predicted survival distributions rather than point estimates. CRPS penalizes both the distance between predicted and actual survival times and the spread of uncertainty, making it particularly suitable for probabilistic survival predictions. The standard out-of-bag value in this context refers to the average CRPS calculated over all OOB samples, offering a comprehensive summary of the model’s predictive performance across the entire dataset.
All hypothesis tests were performed at the standard 5% significance level, with results considered statistically significant if the P-value was less than 0.05. All statistical analyses were conducted using R statistical software, R Core Team (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. The reporting of this study conforms to STROBE guidelines.58,59
Results
Sociodemographic and Clinical Characteristics
Among 37 893 CaP patients, the overall mean age at diagnosis was 66.9 years (SD = 9.03). Patients diagnosed before age 60 (N = 9082) had a mean age at diagnosis of 55.5 (SD = 4.0), while those diagnosed later (N = 28 811) had a mean age at diagnosis of 70.6 years (SD = 6.9). Tumor size was similar across all age groups (mean ≈ 22.9 mm), and lymph node involvement averaged 5.3 nodes. Later-diagnosed patients had higher mean cigarette pack-year exposure (19.2) packs compared to those diagnosed earlier (14.2), Table 1.
Table 2 shows the distribution of categorical variables among CaP patients. The majority of patients were White (87.9%) and non-Hispanic (94.2%). Most were single or never married (62.5%), and about one-quarter (25%) resided in Appalachia. Adenocarcinoma was the predominant histological subtype (97.1%), and tumors were most commonly moderately differentiated (40.0%). Sentinel lymph node biopsies were not performed in 72.3% of cases, and only 1.2% had confirmed positive nodes. Regarding the SEER summary stage, 43.3% of patients had regional spread, 42.5% had distant metastasis, and just 4.1% had localized disease. The most common treatment modality was surgery at the primary site only (31.1%), and over half of the patients (56%) were covered by government insurance programs. Figure 1 illustrates a trend in treatment choice by age at diagnosis. Patients who received surgery at the primary site only were younger at diagnosis (mean = 63.9 years, SD = 8.52, median = 64). In contrast, patients receiving Other Therapy Only were the oldest group (mean = 74.4 years, SD = 9.7, median = 75). Patients who underwent chemotherapy or radiation only (mean age = 66.8 years), surgery plus other therapies (mean age = 66.6 years), or combination therapy (mean age = 68.7 years) had intermediate mean ages at diagnosis. Comparison of the Average Age at Diagnosis by Type of Prostate Cancer Treatment Received
Figure 2 illustrates the relationship between the type of treatment received and the age at diagnosis, stratified by survival status. Notably, the highest mean age at diagnosis was observed among patients who received “Other” therapies, regardless of their survival outcome. Conversely, those who underwent surgery at the primary site only tended to have a lower mean age at diagnosis than recipients of other treatments. This trend suggests that patients opting for alternative treatment options tend to be older at the time of diagnosis, which could indicate a preference for less conventional treatments as patients age. Additionally, it is worth noting that this pattern persists across both survivors and non-survivors alike, emphasizing the role of age in treatment selection. The results also highlight variations in treatment choices and outcomes based on age, shedding light on how age at diagnosis influences the likelihood of selecting certain therapies and ultimately impacts overall survival. Figure 3 illustrates that individuals who received “surgery at the primary site only,” or “chemotherapy only or radiation therapy only” had better survival outcomes. In contrast, those who underwent “other therapy” experienced the poorest survival prospects. Comparison of the Average Age at Diagnosis by Survival Status and Prostate Cancer Treatment Received Comparison of Kaplan-Meier Survival Curves by Prostate Cancer Treatment Received. It Was Observed That Patients Who Received “Other Therapy” Had Significantly the Poorest Odds of Survival Compared to Other Treatment Options

Figure 4 shows that “Surgery plus other therapy” and “Other therapy” were the least commonly selected treatment options across all insurance types, indicating that patients, regardless of insurance coverage, were most likely to receive surgery at the primary site. Figure 5 illustrates that patients with private insurance or those who were uninsured/self-pay were more often diagnosed at earlier stages compared to those covered by government-related insurance programs or listed as having “other/unknown” insurance payer types. Distribution of Insurance Types Across Prostate Cancer Treatement Groups. Most People With Private Insurance Types Are Less Likely to Seek Treatment Distribution of Age at Diagnosis by Insurance Type. Individuals With Private or Those Who Self Pay or Uninsured Tend to Be Diagnosed at a Younger Age Compared to Those With Government-Related or Other Unknown Insurance Types

Prostate Cancer Mortality Rates by Treatment Type and Timing of Diagnosis
The results from the survival forest model identified several key risk factors associated with CaP survival, including age, treatment type, smoking status, cancer histology, cigarette pack-years, CS lymph node involvement, insurance type, tumor size, marital status, ethnicity, number of positive lymph nodes, race, and geographical location (Appalachia vs not) (Figure 6). The fitted model achieved an out-of-bag Continuous Ranked Probability Score (CRPS) of 1.700, a standard out-of-bag value of 0.121, and a requested performance error of 0.195, indicating a very good fit. Age was found to have approximately 2.5 times greater impact on the risk of CaP survival compared to the type of treatment received (Figure 6; Panel A). (Panels A–C) Presents Variable Importance Plots Highlighting the Key Risk Factors Associated with Overall CaP Survival (Panel A), Survival Among Individuals Diagnosed Early (Panel B), and Survival Among Those Diagnosed Later (Panel C)
Based on the above, we explored how the timing of diagnosis influenced CaP risks. Specifically, we examined age subgroups to determine whether diagnosis of CaP before age 60 had different associated risk factors compared to a later diagnosis. Our fitted model for individuals diagnosed with CaP before the age of 60, with standard out-of-bag value of 0.085 and requested performance error of 0.224, the primary predictors of survival included the type of treatment received, insurance type, smoking status, CS lymph node involvement, tumor grade, tumor size, cancer histology, number of positive nodes, cigarette pack-years, marital status, geographical location, and ethnicity (Figure 6; Panel B). The impact of insurance type on CaP survival was approximately 50% of the treatment received. This suggests that the type of insurance held by a patient significantly influences the treatment they receive.
However, when comparing individuals diagnosed with CaP at a later age to those diagnosed earlier, the relative importance of risk factors differed significantly between the two groups. For the later-diagnosed group, the most influential predictors ranked by importance: included treatment type, smoking status, histology, tumor grade, cigarette pack-years, insurance type, marital status, CS lymph node involvement, ethnicity, tumor size, race, and geographic residence (Appalachia) (Figure 6; Panel C). The model demonstrated strong predictive performance, with a standard CRPS value of 0.147 and a requested performance error of 0.240, indicating a very good fit. It is important to note that the impact of smoking status was about 80% of the type of treatment patients receive.
Parameter Estimates From the Elastic Net Model for Overall, Early, and Late-Stage Prostate Cancer Diagnosis
Footnote: “.” indicates that the estimate is not applicable or could not be computed due to a small sample size.
In contrast, patients that received “Surgery at primary site only” or “Chemotherapy-Only or Radiation-Only Therapies” or “Combination Therapies (without surgery)” were associated with positive survival outcomes (coefficients = −0.348 -0.440, and −0.102 respectively), indicating better survival outcomes relative to the patients who received the reference treatment. Patients who received “Surgery at primary site + Other Therapies” showed a small negative survival outcome (coefficient = 0.074), implying a minimal or negligible effect. A similar pattern was observed among patients diagnosed at both early and late stages of the disease. Notably, receiving “Other Therapy Only” appears to be particularly detrimental for patients diagnosed early with CaP. The rest of the results for Table 4 can be interpreted in a similar way.
Discussion
CaP remains a significant public health concern, particularly in geographic regions and subgroups with elevated incidence and mortality rates. A comparison of overall mortality (Table 4) between White and Black individuals shows that Black men have a 9.2% higher hazard of death, holding all other variables constant. This difference likely reflects a multifactorial issue influenced by longstanding structural and social determinants of health, including unequal access to care, socioeconomic disparities, environmental exposures, and systemic racism within the healthcare system. Identifying key risk factors that influence patient survival is crucial for informing clinical decision-making and guiding public health intervention efforts. Leveraging comprehensive, population-level data and advanced analytical techniques allowed for identifying these determinants. In this population-based analysis of over 37 000 CaP cases from the KCR, we identified age at diagnosis, treatment modality, and smoking status as the most influential predictors of survival. Machine learning techniques, specifically Random Survival Forest and Elastic Net regularization, enabled us to quantify the relative importance of these factors, with age emerging as approximately 2.5 times more impactful than treatment type. Patients diagnosed earlier (before age 60) were more likely to undergo surgery and had better survival outcomes, while older patients were disproportionately represented in non-surgical or “other” therapy groups, which were associated with higher mortality. While our ML model identified “surgery at primary site only” as the optimal treatment strategy in patient profiles, a notable proportion (26%) of patients in the cohort received no documented therapy or had unknown treatment status. This discrepancy highlights real-world challenges that extend beyond clinical decision-making and model prediction. Several systemic and patient-level barriers may contribute to the absence of treatment, including limited access to specialty care, delays in diagnosis or referral, insurance-related constraints, geographic disparities (particularly in rural or underserved areas), and patient factors such as comorbidities, refusal of care, or socioeconomic barriers. These considerations may limit the feasibility of model-identified “optimal” strategies in practice. As such, our findings underscore the importance of integrating predictive modeling with health system factors when interpreting treatment recommendations and designing interventions to improve equitable access to care. Insurance status and smoking also significantly shaped survival trajectories, particularly among patients diagnosed early.
Although many of the same risk factors were essential across both early and late diagnosis of CaP, their relative importance shifted. For example, insurance type played a more significant role among early-diagnosed cases, but its influence diminished among later-diagnosed individuals. In contrast, smoking status, tumor characteristics, and histology remained consistently influential across both groups. The inclusion of race and regional factors, along with the amplified impact of smoking-related risks, suggests that long-term exposures and sociodemographic factors may exert increasing influence on survival outcomes over time. These findings reinforce the existing literature, which documents disparities in treatment and outcomes based on age, race, and geography.39,60-62 Our study adds a novel contribution by quantifying these associations using ML. Prior research has shown that older men with CaP often receive less aggressive treatment.39,44,63,64 Our results enhance this understanding by demonstrating the outsized influence of age relative to other clinical factors.
Additionally, smoking was identified as a significant survival determinant, nearly as impactful as treatment type in our models. This aligns with the broader cancer outcomes literature, where smoking is associated with both increased risk of CaP incidence and mortality.65-67 However, other studies suggest this association may be context-specific. For example, Joshu et al reported no significant link between smoking and CaP recurrence among men treated with radical prostatectomy, 68 and Moreira et al. similarly reported no association with CaP-specific mortality. 69 These contrasting findings underscore the complexity of smoking’s role in CaP prognosis, which may vary based on treatment type, timing of exposure measurement, and population studied. Nonetheless, our findings underscore the urgent need to incorporate smoking cessation into CaP survivorship care, especially given its consistent link to adverse outcomes across other cancer types.70,71
Our results strongly support early detection strategies. Patients diagnosed before 60 years of age were more likely to receive surgery and had substantially lower mortality rates. In fact, those diagnosed after the age of 60 had 65.2% lower odds of survival (OR: 0.35), emphasizing the potential life-saving benefits of early diagnosis. These results suggest a need to reconsider existing CaP screening guidelines, which often begin at age 55. While our dataset included relatively few men diagnosed before age 50, the pronounced survival benefit among younger patients supports further investigation into whether initiating screening at an earlier age may be warranted, especially for high-risk populations. 17
Surgical treatment was consistently linked to the most favorable survival outcomes. However, older patients were significantly less likely to receive surgery, which may reflect provider bias, assumptions about frailty, or the presence of comorbidities. This highlights the importance of shared decision-making frameworks that carefully evaluate individual health, rather than relying solely on chronological age.39,72
Insurance type also significantly influenced both treatment choice and the timing of diagnosis. Patients with private insurance were more likely to be diagnosed early and undergo surgery, both of which are associated with improved survival outcomes. Conversely, individuals with government insurance programs or those uninsured/self-pays were more likely to experience delayed diagnosis and receive less aggressive treatment, contributing to poorer outcomes. These patterns align with existing literature, which shows that insurance status independently predicts access to early CaP screening and timely intervention.28,73 Addressing these disparities through policy reforms could meaningfully improve CaP survival outcomes.
Additional variables, such as sentinel lymph node positivity, tumor grade, and marital status, contributed variably across diagnosis groups. These patterns support the development of individualized prognostic models that integrate both clinical and social determinants of health.
Our findings yield several key clinical and public health implications. Screening efforts should prioritize older adults and underserved regions such as Appalachia. Policy reforms aimed at improving insurance access could facilitate earlier diagnosis and more equitable treatment. Smoking cessation should be integrated into standard CaP care pathways. Additionally, ML-driven risk stratification models can be embedded into clinical workflows to support personalized treatment planning. 74 Public health systems should also strengthen community outreach to promote early detection, particularly among high-risk and underserved populations.
This study has notable strengths. We leveraged a large, population-based cancer registry with linked mortality data, enhancing the validity of our survival analyses. The use of ML enabled the capture of complex, nonlinear associations that may be missed by traditional regression approaches. Nonetheless, certain limitations must be acknowledged. As a retrospective study, we cannot establish causality. We lacked access to key variables such as PSA levels, provider decision-making rationale, genomic markers such as BRCA, and patient adherence to treatment. Moreover, the use of level II data limited our ability to examine lifestyle-related risk factors such as alcohol consumption, obesity, or diet. As such, generalizability beyond Kentucky and similar populations may be limited.
Future studies should aim to validate these findings across broader geographic and demographic settings. Integration of genomic and molecular markers could enhance the precision of prognostic modeling. There is an urgent need to develop decision-support tools that apply ML to guide treatment planning tailored by age and risk profile. Ultimately, addressing persistent disparities in early diagnosis and access to treatment remains a pressing priority.
Overall, our study demonstrates that age, treatment modality, and smoking status are dominant predictors of CaP survival, with age exerting the most significant influence. By leveraging ML, we were able to identify critical, modifiable, and context-specific risk factors that can inform personalized care strategies and systemic policy reform. These findings support early detection, equitable access to care, and tailored survivorship interventions, particularly in high-risk, underserved regions like Kentucky.
Conclusion
In summary, this study demonstrates that age, treatment modality, smoking status, and insurance type are key determinants of CaP survival. By applying ML methods to a large, population-based dataset, we identified both well-known and underrecognized factors influencing outcomes, many of which are modifiable. These findings emphasize the need for earlier diagnosis, equitable treatment access, and integration of behavioral health strategies such as smoking cessation. Our work also highlights the feasibility and utility of ML models to disentangle complex survival pathways and support the development of precision public health tools. As CaP care evolves toward a more personalized approach, these data-driven insights can inform actionable interventions that bridge gaps in access and outcomes. Ultimately, addressing disparities in diagnosis timing, treatment choice, and behavioral health will be central to reducing CaP mortality, particularly in high-burden and underserved populations like those in Kentucky and Appalachia.
Footnotes
Acknowledgments
We thank Dr Jaclyn McDowell at the Kentucky Cancer Registry for her assistance in extracting the prostate cancer dataset.
Ethics Statement
This study was approved by the University of Kentucky Institutional Review Board (IRB #63067 dated 10/15/2024). Patient consent was not applicable/required for this retrospective cohort analysis.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Manuscript preparation was supported in part by the funds from the Department of Biostatistics at the University of Kentucky and the Oklahoma Tobacco Settlement Endowment Trust (TSET) grant R23-02, and the National Cancer Institute (NCI) Cancer Center Support Grant (P30CA225520), both awarded to the OU Health Stephenson Cancer Center.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Resource Availability
Programming code is available upon request.
