Abstract
Highlights
• Seventeen Clinical and lifestyle risk factors of CAC used in predictive modeling. • Machine learning models outperformed traditional binary logistic regression. • Study highlights ML’s potential for early CAC detection in firefighters.
Introduction
Firefighting, one of the most hazardous occupations, exposes individuals to extreme conditions such as hazardous chemicals, severe temperatures, and oxygen-deprived environments.1–5 The physical demands of the job, combined with the use of heavy protective equipment, place significant strain on the cardiovascular system. 6 Consequently, sudden cardiac death remains the leading cause of on-duty fatalities among firefighters, accounting for nearly 50% of all line-of-duty deaths. 7 Although firefighters may be perceived as healthier than the general population due to mandatory physical fitness standards, they face elevated cardiovascular risk due to a combination of occupational stressors—such as heat exposure, exertion, and disrupted sleep cycles—and traditional coronary artery disease (CAD) risk factors including obesity, hypertension, diabetes, and aging.8–12
Given this unique risk profile, firefighters represent a critical population for early detection of subclinical atherosclerosis. Coronary artery calcium (CAC) scoring has emerged as a robust, non-invasive tool for assessing cardiovascular risk, offering a quantifiable measure of coronary atherosclerosis, and serving as an independent predictor of future cardiac events.13–16 This study underscores the importance of predicting CAC using patient characteristics to enhance the health and safety of firefighters, aiming to promote early detection and adoption of preventative measures against cardiac events in this vulnerable and high-risk population.
Recent advancements in artificial intelligence (AI), particularly in machine learning (ML), have revolutionized medical diagnostics, offering unprecedented accuracy in clinical outcome predictions.17–22 ML algorithms excel at processing extensive datasets and have demonstrated superior pattern recognition abilities compared to humans, making them invaluable for understanding the complex risk factors linked to CAC.23–28 Despite the potential of ML, few studies have leveraged these algorithms to predict CAC in firefighters. We hypothesize that machine learning algorithms can outperform traditional logistic regression models in predicting CAC. Therefore, our research aims to employ ML algorithms to examine the risk factors for CAC and predict individual CAC in firefighters. By integrating clinical related factors and lifestyle risk factors unique to firefighting, we aim to develop a predictive model that serves as a valuable resource for proactive firefighter health management. Our comprehensive approach leverages various risk parameters to enhance the accuracy of CAC predictions, potentially revolutionizing preventive healthcare practices for this high-risk demographic.
Methods
Study population
This study utilized a comprehensive set of data, including electronic health records, clinical evaluations, and laboratory findings, from a cohort of firefighters. The data were collected at Ascension Public Safety Medical, an occupational health facility that provides both pre-employment and annual medical assessments for firefighters across multiple departments in the Midwest, primarily based in Indiana. No formal research questionnaire was administered as part of this study. Participants were active-duty firefighters aged 35 to 68 years who completed a full annual health screening. The cohort predominantly (93.3%) comprised male firefighters. This screening included demographic information, a physical examination, clinical laboratory testing, a physical fitness assessment, and a non-contrast computed tomography scan for CAC scoring. Individuals were excluded if they were not actively employed as firefighters at the time of evaluation or had incomplete health records. All data were extracted from standardized electronic health records and reviewed for completeness and quality. Informed consent for the utilization of their health records in research was obtained at the time of these assessments. The ethical conduct of this study was ensured through approval from the XXX (protocol number XXX) and the XXX (protocol number: XXX). All research involving the human subjects was performed in accordance with the Declaration of Helsinki and relevant guidelines and regulations. Informed consent was obtained from all participants.
Coronary artery calcium scores assessment
CAC levels were assessed using non-contrast computed tomography (CT) imaging, a reliable non-invasive method for evaluating subclinical coronary atherosclerosis. Participants were scanned using an Imatron C Electron Beam Tomography (EBT) scanner following a standardized protocol designed to optimize image quality while minimizing radiation exposure. CAC scores were calculated using the Agatston method, which combines lesion area and density for quantification. Lesions were defined as regions with a density exceeding 130 Hounsfield units and a minimum area of 1 mm2. The total CAC score was computed as the sum of all lesion scores. A board-certified radiologist specializing in cardiovascular imaging, blinded to participants’ clinical data, independently reviewed all scans to confirm scoring accuracy. For this analysis, CAC was dichotomized into two categories: CAC = 0 (no detectable coronary calcium) and CAC >0 (any detectable coronary calcium), a classification frequently used in population studies to distinguish between absence and presence of subclinical atherosclerosis. Additional methodological details are provided in Hoff et al. (2001). 29
Risk factors
Age was recorded in actual years. Body Mass Index (BMI) was calculated for each participant by dividing their weight in kilograms by the square of their height in meters, serving as an index to determine overweight status, with a BMI of 25 kg/m2 or higher indicating overweight.
Participants underwent blood tests to analyze their lipid profiles, specifically measuring levels of low-density lipoprotein (LDL) cholesterol and total cholesterol, reported in milligrams per deciliter (mg/dL). Electrolyte levels, including sodium and chloride, were quantified in millimoles per liter (mmol/L). Additional evaluations included renal function assessed by glomerular filtration rate, blood glucose levels, and a complete blood count with a particular emphasis on monocyte levels (measured in thousands per cubic millimeter - K/CUMM) and platelet count.
Participants were asked to self-report their tobacco usage by responding with a simple yes or no on a health risk questionnaire. (HRA). A family history of cardiovascular disease was also recorded in the HRA if participants reported a history of heart disease among direct family members. Resting systolic blood pressure and alkaline phosphatase levels were measured. Cardiorespiratory fitness was evaluated by measuring the maximum volume of oxygen consumption (MaxVO2).
This comprehensive data collection was designed to construct a multidimensional health profile for each firefighter, considering physical metrics, lifestyle factors, and medical history to inform the predictive modeling of CAC.
Machine learning algorithms
The Random Forest (RF) algorithm employs randomness to generate a multitude of decision trees. The combined output of these trees is consolidated into a cohesive outcome using a majority vote system for classification tasks and a mean value calculation for regression activities. Randomization in RF occurs through two primary methods. First, the dataset undergoes “bootstrap sampling,” where samples are drawn with replacement to create multiple bootstrap samples. This process is known as “bootstrap aggregation” or “bagging.” Second, randomization occurs at the decision nodes, where a subset of predictors is selected at each node. Typically, for a dataset with p predictors, the number of predictors chosen is approximately the square root of p, although this parameter can be adjusted. This random selection of variables and evaluation of thresholds continues until nodes become “pure” (containing only cases or controls) or meet a predetermined stopping criterion. The process of building trees is repeated multiple times, typically between 100 and 1000 iterations, to form the random forest. Once the random forest model is built, it is used for predictions. Every new instance is processed by all the trees within the model, and the forecasted result is determined by the majority class (the class that gains the highest number of “votes” from the trees) for classification tasks, or by calculating the mean of all the predictions for regression tasks.
Extreme Gradient Boosting (XGBoost) is a powerful implementation of the gradient boosting algorithm, which integrates the outputs of numerous weak models, typically decision trees, to form a robust predictive model. XGBoost iteratively adds trees in a sequence to correct errors made by the preceding trees, enhancing model performance. It supports a wide range of predictive modeling tasks, including regression, classification, and ranking. A distinguishing feature of XGBoost is its incorporation of regularization (L1 and L2), which helps prevent overfitting and improves model generalizability. The following objective function is optimized by XGBoost at each iteration t.
T is the number of leaves in the tree; wj is the score on the j-th leaf; and λ are parameters that control the complexity of the model, where γ is the penalty for the number of leaves and λ is the L2 regularization term on the leaf weights. This regularization term enhances the model’s generalizability by normalizing the final learned weights, which prevents overfitting.
Support Vector Machine (SVM) is a robust and versatile supervised learning model capable of handling both regression and classification tasks. The fundamental concept of SVM is to identify the optimal hyperplane that best separates the classes in the feature space. The ideal hyperplane maximizes the margin, defined as the distance to the nearest training data points from either class, thereby enhancing the classifier’s robustness. Support vectors, the data points closest to the hyperplane from each class, are critical in this process as they determine the hyperplane’s position and orientation. These support vectors influence the decision boundary and are pivotal in the construction of the SVM model.
When applied to binary classification, support vector machines seek for the best possible hyperplane that maximally divides the two classes. In a space with d dimensions, the equation of a hyperplane is:
The weight is denoted by w, the input feature vector by x, and the bias by b. The goal is to find the optimal solution that minimizes the norm of the weight vector
Then the problem is:
where C is the regulation parameter.
Based on Bayes’ theorem, Naïve Bayes (NB) algorithm is a classification method that calculates the probability of a class given a set of independent features. The equation of Naïve Bayes algorithm is:
The posterior probability of class C given predictor X is
K-Nearest Neighbors (KNN) is a non-parametric learning algorithm widely employed for classification and regression tasks. The process is based on feature similarity, where the outcome for a new data point is decided by the majority vote (in classification) or average (in regression) of its ‘k’ closest neighbors in the feature space. KNN calculates the distance between data points, usually employing the Euclidean distance formula for a dataset with ‘n’ features:
Data splitting and hyperparameter tuning
The dataset was partitioned into training (70%) and testing (30%) subsets, ensuring that the distribution of CAC was consistent across both sets. This stratified splitting approach maintains the class proportions in both subsets, which is crucial for ensuring unbiased performance evaluation.
To determine the optimal hyperparameters for our classification model, we employed a grid search methodology combined with five-fold cross-validation, applied exclusively to the training data. Grid search involved systematically working through multiple combinations of parameter values, cross-validating each combination to determine the best-performing model.
After determining the optimal hyperparameters that yielded the best performance, we used these parameters to train the model on the entire training dataset. The final cross-validated evaluation was conducted using the test data to assess the model’s performance, providing an unbiased estimation of the model’s ability to generalize to new, unseen data.
Model evaluation
The prediction performance of our classification models was assessed using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) approach. To evaluate the proposed machine learning models, their prediction performance was compared with that of the traditional binary logistic regression (BLR) model. The ROC curve displays the sensitivity (true positive rate) against 1-specificity (false positive rate) at different threshold levels, denoted by:
The AUC, which measures the total area under the ROC curve, offers a comprehensive evaluation of the model’s discriminative ability across all classification thresholds, ranging from (0,0) to (1,1). The AUC is computed as follows:
Statistical analysis
Binary logistic regression (BLR) was used as a baseline statistical model to compare with machine learning classifiers in predicting the presence of coronary artery calcium (CAC >0). The outcome variable was dichotomized CAC status (CAC = 0 vs. CAC >0), and the same set of predictor variables used in the machine learning models were included in the logistic regression model. These included age, overweight status, tobacco use, family history of cardiovascular disease, body mass index, LDL cholesterol, sodium, chloride, potassium, total cholesterol, glomerular filtration rate, blood glucose, monocytes, platelet count, alkaline phosphatase, resting systolic blood pressure, and maximum oxygen volume (MaxVO2).
No variable selection procedures (e.g., stepwise regression) were applied; all variables were included in the multivariable logistic regression to maintain consistency with the input used in machine learning algorithms. The BLR model was trained on the 70% training dataset, and performance was evaluated on the 30% test set using the area under the receiver operating characteristic curve (AUC), following the same evaluation protocol as the ML models.
Results
Firefighters’ characteristics and their univariate analyses with CAC.
Note: Values are presented as means (standard deviations) for continuous variables and counts (percentages) for categorical variables. The correlation coefficients (r) and p-values are derived from univariate analyses. For continuous variables, Pearson’s correlation coefficient was calculated. For binary categorical variables, point-biserial correlations were calculated.
The risk factors were ranked based on their correlation coefficients with CAC, with positive associations from the top to the bottom and inverse association from the bottom to the top, shown in Figure 1. Age, resting systolic blood pressure, and monocytes were the strongest predictors positively associated with CAC. The maximum volume of oxygen, GFR, and sodium showed the strongest inverse associations with CAC. The correlation coefficients between risk factors and CAC.
The comparison of average model performance with best tunning parameters using 5-fold CV in training dataset.
The comparison of model performance with best performance in training dataset.
The comparison of cross-validated model performance between different classifications in testing dataset.
Figure 2 displays the Receiver Operating Characteristic curve in the cross-validated testing dataset, illustrating the diagnostic performance of different prediction models used in our research. The graph plots sensitivity against 1- specificity for several classifiers to measure true positive rate and false positive rate. The XGBoost model’s curve, positioned at the top-left corner of the plot, demonstrates its robustness by achieving the maximum combined sensitivity and 1 - specificity, confirming the quantitative results. The cross-validated receiver operating characteristics of different machine learning models.
Figure 3 displays the variable importance plot generated by the XGBoost algorithm in the cross-validated testing dataset, ranking predictors based on their impact on the model’s performance. The predictive model demonstrated that age, alkaline phosphatase, and GFR were the most important variables. Following important features included the platelet count, resting systolic blood pressure and LDL cholesterol, highlighting their importance in evaluating the outcome variable. Variables such as glucose, monocytes, and total cholesterol contributed less to the model’s performance. The contribution of other clinical indicators, such as chloride, overweight status, and maximum volume of oxygen, was moderate. The model indicated that potassium, sodium, BMI, and tobacco use were of lower importance. The cross-validated variable importance in XGBoost algorithm.
Overall, our results showed that XGBoost achieved the best performance in these 6 prediction algorithms.
Discussion
Given the increased prevalence of CAD risk factors such as obesity, diabetes, hypertension, and aging among firefighters, CAC has been identified as a critical tool for the early detection of coronary atherosclerosis.18,23,30 CAC is recognized as a reliable and independent predictor of atherosclerotic cardiovascular disease. 16 In this study, we introduced novel classification models based on XGBoost, Random Forest, Support Vector Machine, Naïve Bayes, and K Nearest Neighbor, and identified their predictive accuracy against traditional binary logistic regression.
By incorporating clinical related factors and lifestyle risk factors, we developed ML predictive models to facilitate early prediction and detection of CAC risks among firefighters, enhancing our understanding of the complexity of CAC. Our findings indicated significant associations between higher CAC and factors such as age, resting systolic blood pressure, and monocyte levels, whereas maximum oxygen volume uptake, glomerular filtration rate, and maximum heart rate were inversely associated with CAC. All the novel ML algorithms demonstrated superior predictive accuracy compared to BLR, with XGBoost achieving the highest AUC of 0.770, slightly outperforming NB (0.768) and SVM (0.765), moderately outperforming RF (0.749), and substantially outperforming KNN (0.671), and BLR (0.658) (Table 2). Although the AUCs of XGBoost, NB, and SVM were numerically similar, we chose to emphasize XGBoost due to its consistent performance across training and testing datasets, robustness to multicollinearity, and enhanced interpretability via feature importance scores (Figure 3). Our aim was not to test statistical significance among classifiers but to identify the most clinically applicable and explainable model for integration into occupational health systems. XGBoost’s transparency in identifying key predictors supports data-driven risk stratification and underscores its utility in practical implementation, despite its modest margin in predictive performance.
Most of previous studies focused on the investigation of the risk factors and the prediction models of CVD and CAC within the general population.14,23,31,32 Research specific to firefighters has been limited and often centered on general cardiovascular risk factors rather than predictive modeling for CAC. For instance, prior studies have employed ML methods like Random Forest to estimate the likelihood of cardiovascular disease in firefighters (Jones et al., 2020). 33 However, our study incorporates a comprehensive collection of predictors that were not utilized by these previous investigations, with a specific focus on CAC. Traditional statistical models have been employed in previous research. One common method is logistic regression; Farjo et al. (2020) applied it to predict CAC (0 vs >400) in the general population. 14 Logistic regression considers interaction effects only if interaction terms are explicitly specified in the model (usually only 2-way interactions, if at all, depending on interest of the investigator) and is frequently employed to calculate odds ratios for the association between different only the main effects of the risk factors and health outcomes. While logistic regression models are valuable for assessing the interaction effects and calculating odds ratios, they may not capture or discover the complex patterns and interactions present in datasets with diverse risk variables.
The clinical utility of prediction models in preventive cardiology has gained increasing attention, particularly as artificial intelligence (AI) methods continue to evolve. In recent studies, ML-based approaches have shown promise in improving the detection and prognostication of cardiovascular conditions. For example, contrast-free cardiovascular magnetic resonance imaging combined with AI was used for accurate identification of myocardial infarction, demonstrating the expanding role of AI in imaging-based risk prediction. 34 In another systematic review, machine learning techniques applied to SPECT data significantly improved prognostic predictions in coronary artery disease. 35 Our findings further contribute to this growing evidence base by demonstrating that ML models—particularly XGBoost, NB, and SVM substantially outperform traditional binary logistic regression in predicting the presence of coronary artery calcium (CAC) using readily available clinical and lifestyle data.
Importantly, these models are not just algorithmic exercises—they have direct relevance to digital health implementation. For example, the XGBoost model can be embedded into electronic health record (EHR) systems as a real-time risk calculator to flag firefighters at elevated risk for subclinical coronary atherosclerosis. This model could also be integrated into mobile health (mHealth) platforms or occupational health dashboards to deliver personalized risk profiles and enable proactive decision-making. Such integration enhances routine health surveillance by automating CAC risk prediction based solely on structured EHR data, eliminating reliance on costly imaging in settings where it may not be feasible.
This informatics-driven approach is particularly valuable in occupational settings like firefighting, where physical exertion, exposure to toxins, and shift work all contribute to elevated cardiovascular risk. Embedding these ML tools into digital health infrastructure supports scalable, personalized screening workflows and aligns with current efforts to modernize occupational medicine through health informatics. Ultimately, this application has the potential to reduce cardiovascular events, improve operational readiness, and enhance long-term health outcomes in this high-risk workforce.
Our study’s achievement of an AUC of 0.770 underscores the superior pattern recognition capabilities of XGBoost (and cross-validation performance was nearly as good for NB and SVM), particularly in handling complex relationships and many predictors. This research is pioneering in using multiple machine learning algorithms, including XGBoost, for CAC prediction among firefighters, highlighting the potential of ML applications for health risk assessment in this high-risk group. This innovation is critical given the specific hazards firefighters face and the need for individualized predictive models that accurately identify CAC risks to inform and drive the implementation of targeted preventive strategies.
Strength
Our study has several strengths. The incorporation of detailed clinical related factors and lifestyle risk factors enables a comprehensive analysis of CAC risk factors specific to firefighters. Using ML’s predictive models, particularly XGBoost (and secondarily NB and SVM), not only improved the accuracy of CAC predictions but also uncovered complex data patterns that can inform both future research and preventative strategies. Moreover, our research emphasizes the transformative potential of machine learning in the field of preventive healthcare. By leveraging our models for early detection and timely intervention, we anticipate a significant reduction in the incidence and consequences of CAD among firefighters, a particularly vulnerable workforce. Early identification of at-risk individuals through innovative ML applications can enable prompt intervention strategies, potentially preventing the progression of CAD.
Limitations
Our study has several limitations. The cross-sectional study limited our ability to explore the temporal relationships between risk factors and the development of CAC. Future longitudinal studies are needed to validate the predictive models and explore the temporal associations between risk factors and CAC development, enhancing an understanding of CAC risk factors among firefighters in a more dynamic manner. Moreover, our study populations are primary focused on whites and males in a specific occupation, limiting the generalizability of our findings. Additionally, this study did not include a sample size calculation, as the analysis was based on existing clinical data. This limits our ability to assess whether the sample size was adequately powered for all predictors included in the model.
Conclusions
In conclusion, our research contributes important insights into the predictive modeling of CAC among firefighters, emphasizing the value of ML algorithms in enhancing early detection and preventive strategies. Such advancements in predictive analytics are instrumental for the proactive health management of firefighters, potentially mitigating the risks associated with their demanding and hazardous profession. The utilization and value of ML algorithms, particularly XGBoost, emphasize the potential for advancement in healthcare and highlight the necessity of focused preventive strategies for high-risk occupational populations.18,36 Future studies should explore additional machine learning algorithms and integrate new biomarkers and imaging techniques to refine CAC predictions and preventive strategies for firefighters.18,36–39
Clinical perspectives
Competency in medical knowledge
This study advances predictive modeling of CAC scores in firefighters by comparing multiple ML algorithms, demonstrating their superiority over traditional binary logistic regression. Significant associations were found between CAC scores and factors such as age, blood pressure, and monocyte levels, while inverse associations were noted with maximum oxygen uptake, glomerular filtration rate, and heart rate. Among all models tested, XGBoost achieved the highest area under the curve (AUC), underscoring its utility in identifying subclinical cardiovascular risk. These findings support the potential of ML models to deliver individualized, data-driven cardiovascular risk assessments and inform targeted preventive strategies in high-risk occupational populations.
Translational outlook
This study highlights the translational potential of ML-based models, particularly XGBoost, in improving early detection and preventive management of CAC in firefighters. These algorithms can be integrated into EHR systems or occupational health informatics platforms to enable real-time, automated risk stratification without the need for immediate imaging. By leveraging routinely collected health data, such models support more personalized and proactive clinical decision-making, particularly in settings where cardiovascular screening resources are limited. Future research should validate these models in longitudinal settings, incorporate additional biomarkers or imaging modalities, and explore deployment within mHealth tools to broaden their impact across digital health ecosystems.
Footnotes
Acknowledgements
We sincerely appreciate firefighters’ participation to this study.
Ethical approval
Our study only involved secondary data analysis, so participants were not recruited directly. Our study was approved by Indiana University Office of Research Compliance, under the protocol number 20510 and the Ascension Health Institutional Review Board, under the protocol number: RIN20240002.
Informed consent
Written informed consent was obtained from individual participants included in the study. The data collection and analysis adhered to the confidentiality and privacy standards.
Author Contributions
All authors had responsibility for the data management and statistical analysis. NH and ML was responsible for design for the study concept. ML analyzed the data and developed the machine learning algorithms. ML drafted the manuscript. All authors (ML, JH, CM, YX, TZ, LZ, PM, JW, VK, SM, HN) made significant contributions to the analysis of data and review of the manuscript. All authors had responsibility for the data’s integrity and the precision of analyses.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data utilized in this article cannot be shared publicly due to privacy concerns for the individuals who participated in the study. The study data are available upon reasonable request. For inquiries regarding the data or the study, please contacting the corresponding author, Mingyue Li, at
Patient and public involvement
This study used secondary data analysis, so patient and public were not directly involved in this study. Therefore, no participants were specifically recruited for this study. Our study was approved by Indiana University Office of Research Compliance, as evidenced by the allocated protocol number 20510, and Ascension Health Institutional Review Board, as evidenced by RIN20240002.
