Using machine learning models to predict coronary artery calcium scores in firefighters

Abstract

Objective: To develop and compare the predictive accuracy of machine learning (ML) models for coronary artery calcium (CAC) prediction among firefighters and to evaluate their cross-validated performance against traditional binary logistic regression (BLR). Methods: This study utilized health records from 416 firefighters who underwent comprehensive health screenings at Ascension Public Safety Medical. CAC was assessed using cardiac computed tomography scans. The degree of CAC was measured using the Agatston scores. 17 clinical and lifestyle related risk variables were collected. Machine learning models, including XGBoost, Random Forest (RF), Support Vector Machine (SVM), Naïve Bayes (NB), and K Nearest Neighbor (KNN), were developed and compared. Additionally, the performance of these ML models was evaluated against traditional binary logistic regression (BLR). Results: Among the 416 firefighters, age (r = 0.28, p < 0.0001), glucose levels (r = 0.13, p = 0.001), monocyte percentages (r = 0.13, p = 0.001), and resting systolic blood pressure (r = 0.13, p = 0.009) were positively associated with CAC. While sodium levels (r = −0.11, p = 0.038), GFR (r = −0.17, p = 0.021), and maximum oxygen volumes (r = −0.19, p = 0.0002) were inversely associated with CAC. XGBoost achieved the highest cross-validated area under the curve (AUC) of 0.770, outperforming NB (0.768), SVM (0.765), RF (0.749), KNN (0.671), and BLR (0.658). Conclusion: Our research demonstrates the efficacy of ML algorithms, particularly XGBoost, in enhancing early detection and preventive strategies for CAC among firefighters. These advancements are crucial for proactive health management in this high-risk group, potentially mitigating risks associated with their demanding profession.

Graphical Abstract

Keywords

machine learning algorithms coronary artery calcification coronary artery disease firefighters

Highlights

• Seventeen Clinical and lifestyle risk factors of CAC used in predictive modeling.

• Machine learning models outperformed traditional binary logistic regression.

• Study highlights ML’s potential for early CAC detection in firefighters.

Introduction

Firefighting, one of the most hazardous occupations, exposes individuals to extreme conditions such as hazardous chemicals, severe temperatures, and oxygen-deprived environments.^1–5 The physical demands of the job, combined with the use of heavy protective equipment, place significant strain on the cardiovascular system.⁶ Consequently, sudden cardiac death remains the leading cause of on-duty fatalities among firefighters, accounting for nearly 50% of all line-of-duty deaths.⁷ Although firefighters may be perceived as healthier than the general population due to mandatory physical fitness standards, they face elevated cardiovascular risk due to a combination of occupational stressors—such as heat exposure, exertion, and disrupted sleep cycles—and traditional coronary artery disease (CAD) risk factors including obesity, hypertension, diabetes, and aging.^8–12

Given this unique risk profile, firefighters represent a critical population for early detection of subclinical atherosclerosis. Coronary artery calcium (CAC) scoring has emerged as a robust, non-invasive tool for assessing cardiovascular risk, offering a quantifiable measure of coronary atherosclerosis, and serving as an independent predictor of future cardiac events.^13–16 This study underscores the importance of predicting CAC using patient characteristics to enhance the health and safety of firefighters, aiming to promote early detection and adoption of preventative measures against cardiac events in this vulnerable and high-risk population.

Recent advancements in artificial intelligence (AI), particularly in machine learning (ML), have revolutionized medical diagnostics, offering unprecedented accuracy in clinical outcome predictions.^17–22 ML algorithms excel at processing extensive datasets and have demonstrated superior pattern recognition abilities compared to humans, making them invaluable for understanding the complex risk factors linked to CAC.^23–28 Despite the potential of ML, few studies have leveraged these algorithms to predict CAC in firefighters. We hypothesize that machine learning algorithms can outperform traditional logistic regression models in predicting CAC. Therefore, our research aims to employ ML algorithms to examine the risk factors for CAC and predict individual CAC in firefighters. By integrating clinical related factors and lifestyle risk factors unique to firefighting, we aim to develop a predictive model that serves as a valuable resource for proactive firefighter health management. Our comprehensive approach leverages various risk parameters to enhance the accuracy of CAC predictions, potentially revolutionizing preventive healthcare practices for this high-risk demographic.

Methods

Study population

This study utilized a comprehensive set of data, including electronic health records, clinical evaluations, and laboratory findings, from a cohort of firefighters. The data were collected at Ascension Public Safety Medical, an occupational health facility that provides both pre-employment and annual medical assessments for firefighters across multiple departments in the Midwest, primarily based in Indiana. No formal research questionnaire was administered as part of this study. Participants were active-duty firefighters aged 35 to 68 years who completed a full annual health screening. The cohort predominantly (93.3%) comprised male firefighters. This screening included demographic information, a physical examination, clinical laboratory testing, a physical fitness assessment, and a non-contrast computed tomography scan for CAC scoring. Individuals were excluded if they were not actively employed as firefighters at the time of evaluation or had incomplete health records. All data were extracted from standardized electronic health records and reviewed for completeness and quality. Informed consent for the utilization of their health records in research was obtained at the time of these assessments. The ethical conduct of this study was ensured through approval from the XXX (protocol number XXX) and the XXX (protocol number: XXX). All research involving the human subjects was performed in accordance with the Declaration of Helsinki and relevant guidelines and regulations. Informed consent was obtained from all participants.

Coronary artery calcium scores assessment

CAC levels were assessed using non-contrast computed tomography (CT) imaging, a reliable non-invasive method for evaluating subclinical coronary atherosclerosis. Participants were scanned using an Imatron C Electron Beam Tomography (EBT) scanner following a standardized protocol designed to optimize image quality while minimizing radiation exposure. CAC scores were calculated using the Agatston method, which combines lesion area and density for quantification. Lesions were defined as regions with a density exceeding 130 Hounsfield units and a minimum area of 1 mm². The total CAC score was computed as the sum of all lesion scores. A board-certified radiologist specializing in cardiovascular imaging, blinded to participants’ clinical data, independently reviewed all scans to confirm scoring accuracy. For this analysis, CAC was dichotomized into two categories: CAC = 0 (no detectable coronary calcium) and CAC >0 (any detectable coronary calcium), a classification frequently used in population studies to distinguish between absence and presence of subclinical atherosclerosis. Additional methodological details are provided in Hoff et al. (2001).²⁹

Risk factors

Age was recorded in actual years. Body Mass Index (BMI) was calculated for each participant by dividing their weight in kilograms by the square of their height in meters, serving as an index to determine overweight status, with a BMI of 25 kg/m² or higher indicating overweight.

Participants underwent blood tests to analyze their lipid profiles, specifically measuring levels of low-density lipoprotein (LDL) cholesterol and total cholesterol, reported in milligrams per deciliter (mg/dL). Electrolyte levels, including sodium and chloride, were quantified in millimoles per liter (mmol/L). Additional evaluations included renal function assessed by glomerular filtration rate, blood glucose levels, and a complete blood count with a particular emphasis on monocyte levels (measured in thousands per cubic millimeter - K/CUMM) and platelet count.

Participants were asked to self-report their tobacco usage by responding with a simple yes or no on a health risk questionnaire. (HRA). A family history of cardiovascular disease was also recorded in the HRA if participants reported a history of heart disease among direct family members. Resting systolic blood pressure and alkaline phosphatase levels were measured. Cardiorespiratory fitness was evaluated by measuring the maximum volume of oxygen consumption (MaxVO₂).

This comprehensive data collection was designed to construct a multidimensional health profile for each firefighter, considering physical metrics, lifestyle factors, and medical history to inform the predictive modeling of CAC.

Machine learning algorithms

The Random Forest (RF) algorithm employs randomness to generate a multitude of decision trees. The combined output of these trees is consolidated into a cohesive outcome using a majority vote system for classification tasks and a mean value calculation for regression activities. Randomization in RF occurs through two primary methods. First, the dataset undergoes “bootstrap sampling,” where samples are drawn with replacement to create multiple bootstrap samples. This process is known as “bootstrap aggregation” or “bagging.” Second, randomization occurs at the decision nodes, where a subset of predictors is selected at each node. Typically, for a dataset with p predictors, the number of predictors chosen is approximately the square root of p, although this parameter can be adjusted. This random selection of variables and evaluation of thresholds continues until nodes become “pure” (containing only cases or controls) or meet a predetermined stopping criterion. The process of building trees is repeated multiple times, typically between 100 and 1000 iterations, to form the random forest. Once the random forest model is built, it is used for predictions. Every new instance is processed by all the trees within the model, and the forecasted result is determined by the majority class (the class that gains the highest number of “votes” from the trees) for classification tasks, or by calculating the mean of all the predictions for regression tasks.

Extreme Gradient Boosting (XGBoost) is a powerful implementation of the gradient boosting algorithm, which integrates the outputs of numerous weak models, typically decision trees, to form a robust predictive model. XGBoost iteratively adds trees in a sequence to correct errors made by the preceding trees, enhancing model performance. It supports a wide range of predictive modeling tasks, including regression, classification, and ranking. A distinguishing feature of XGBoost is its incorporation of regularization (L1 and L2), which helps prevent overfitting and improves model generalizability. The following objective function is optimized by XGBoost at each iteration t.

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

where

n

is the number of observations in the dataset,

y_{i}

is the actual value of the

i

-th observations,

{\hat{y}}_{i}^{(t - 1)}

is the prediction of the

i

-th observation at the

t - 1

iteration,

f_{t}

denotes the decision tree added at the

t

-th iteration,

l

is a differentiable convex loss function that measures the difference between the predicted value and the actual value, and the regularization term

Ω

is defined as:

Ω (f_{t}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

T is the number of leaves in the tree; w_j is the score on the j-th leaf; and λ are parameters that control the complexity of the model, where γ is the penalty for the number of leaves and λ is the L2 regularization term on the leaf weights. This regularization term enhances the model’s generalizability by normalizing the final learned weights, which prevents overfitting.

Support Vector Machine (SVM) is a robust and versatile supervised learning model capable of handling both regression and classification tasks. The fundamental concept of SVM is to identify the optimal hyperplane that best separates the classes in the feature space. The ideal hyperplane maximizes the margin, defined as the distance to the nearest training data points from either class, thereby enhancing the classifier’s robustness. Support vectors, the data points closest to the hyperplane from each class, are critical in this process as they determine the hyperplane’s position and orientation. These support vectors influence the decision boundary and are pivotal in the construction of the SVM model.

When applied to binary classification, support vector machines seek for the best possible hyperplane that maximally divides the two classes. In a space with d dimensions, the equation of a hyperplane is:

w \cdot x + b = 0

The weight is denoted by w, the input feature vector by x, and the bias by b. The goal is to find the optimal solution that minimizes the norm of the weight vector $| | w | |$ while correctly classifying the examples, so the problem is:

{}_{w, b}^{\min}{\frac{1}{2} | | w | |}^{2}

That is, for every i-th instance, and under the condition:

y_{i} (w \cdot x_{i} + b) \geq 1, \forall i

$y_{i} \in [- 1, 1]$ is the class label of $x_{i}$ . The condition $y_{i} (w \cdot x_{i} + b) \geq 1, \forall i$ guarantees that each data point is located inside the acceptable margin. If the case is not linearly separable, then slack variables can be used to misclassify noisy or difficult examples:

y_{i} (w \cdot x_{i} + b) \geq 1 - \in_{i}, \forall i

Then the problem is:

{}_{w, b}^{\min}{\frac{1}{2} | | w | |}^{2} + C \sum_{i = 1}^{n} \in_{i}

where C is the regulation parameter.

Based on Bayes’ theorem, Naïve Bayes (NB) algorithm is a classification method that calculates the probability of a class given a set of independent features. The equation of Naïve Bayes algorithm is:

P (C | X) = \frac{P (X | C) P (C)}{P (X)}

The posterior probability of class C given predictor X is $P (C | X)$ , the likelihood of a predictor given a class is denoted as $P (X | C)$ , the prior probability of the class is represented by $P (C)$ , and the prior probability of the predictor is denoted as $P (X)$ . Naive Bayes simplifies the conditional probability $P (X | C)$ by assuming the independence of features, allowing it to be computed efficiently as $P (X_{1} | C)$ , $P (X_{2} | C)$ … $P (X_{n} | C)$ for classification tasks. Despite its simplicity and the assumption of feature independence, Naive Bayes can be highly effective in tasks such as text classification, including spam detection and sentiment analysis, by selecting the class with the highest posterior probability as the predicted result.

K-Nearest Neighbors (KNN) is a non-parametric learning algorithm widely employed for classification and regression tasks. The process is based on feature similarity, where the outcome for a new data point is decided by the majority vote (in classification) or average (in regression) of its ‘k’ closest neighbors in the feature space. KNN calculates the distance between data points, usually employing the Euclidean distance formula for a dataset with ‘n’ features:

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

d (x,y) is the distance between x and y;

x_{i}

and

y_{i}

are the values of the ith feature of points x and y respectively. The choice of ‘k’ significantly influences the algorithm’s performance. A smaller ‘k’ can make the model sensitive to noise, while a larger ‘k’ can smooth out the classification or regression results but may ignore small patterns. Therefore, selecting an optimal ‘k’ is crucial for achieving accurate results. KNN is particularly effective for applications where the data is well-segmented into distinct classes. However, its computational complexity increases with the size of the dataset, as the algorithm must compute the distance to all training points for each prediction.

Data splitting and hyperparameter tuning

The dataset was partitioned into training (70%) and testing (30%) subsets, ensuring that the distribution of CAC was consistent across both sets. This stratified splitting approach maintains the class proportions in both subsets, which is crucial for ensuring unbiased performance evaluation.

To determine the optimal hyperparameters for our classification model, we employed a grid search methodology combined with five-fold cross-validation, applied exclusively to the training data. Grid search involved systematically working through multiple combinations of parameter values, cross-validating each combination to determine the best-performing model.

After determining the optimal hyperparameters that yielded the best performance, we used these parameters to train the model on the entire training dataset. The final cross-validated evaluation was conducted using the test data to assess the model’s performance, providing an unbiased estimation of the model’s ability to generalize to new, unseen data.

Model evaluation

The prediction performance of our classification models was assessed using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) approach. To evaluate the proposed machine learning models, their prediction performance was compared with that of the traditional binary logistic regression (BLR) model. The ROC curve displays the sensitivity (true positive rate) against 1-specificity (false positive rate) at different threshold levels, denoted by:

\frac{S e n s i t i v i t y}{1 - S p e c i f i c i t y}

The AUC, which measures the total area under the ROC curve, offers a comprehensive evaluation of the model’s discriminative ability across all classification thresholds, ranging from (0,0) to (1,1). The AUC is computed as follows:

A U C = \int_{0}^{1} R O C (t) d t

where ROC(t) is the ROC curve as a function of the threshold t. An AUC of 1 indicates perfect predictive accuracy, whereas an AUC of 0.5 represents a model with discriminative capacity similar to random guessing, shown as a diagonal line in the ROC space. The AUC was calculated using the test data to provide an unbiased evaluation of the model’s ability to distinguish between positive and negative classes.

Statistical analysis

Binary logistic regression (BLR) was used as a baseline statistical model to compare with machine learning classifiers in predicting the presence of coronary artery calcium (CAC >0). The outcome variable was dichotomized CAC status (CAC = 0 vs. CAC >0), and the same set of predictor variables used in the machine learning models were included in the logistic regression model. These included age, overweight status, tobacco use, family history of cardiovascular disease, body mass index, LDL cholesterol, sodium, chloride, potassium, total cholesterol, glomerular filtration rate, blood glucose, monocytes, platelet count, alkaline phosphatase, resting systolic blood pressure, and maximum oxygen volume (MaxVO₂).

No variable selection procedures (e.g., stepwise regression) were applied; all variables were included in the multivariable logistic regression to maintain consistency with the input used in machine learning algorithms. The BLR model was trained on the 70% training dataset, and performance was evaluated on the 30% test set using the area under the receiver operating characteristic curve (AUC), following the same evaluation protocol as the ML models.

Results

The association between risk factors and CAC was calculated, as shown in Table 1. For continuous variables, the mean and standard deviation were computed, while for categorical variables, the count and percentage of firefighters were determined. Age was the most significant predictor of CAC among firefighters, with older individuals showing higher CAC (r = 0.28, p < 0.0001). Although higher prevalence of CAC were observed for overweight status, tobacco use, family history of cardiovascular disease, LDL cholesterol, and BMI, these factors did not reach statistical significance. Sodium levels were inversely related to CAC (r = −0.11, p = 0.038). Both glucose levels and monocyte percentages demonstrated a positive correlation with CAC (r = 0.13, p = 0.001 for both). Reduced kidney function, as indicated by GFR, was also inversely correlated with increased CAC (r = −0.17, p = 0.021). Cardiovascular fitness markers, specifically resting systolic blood pressure, were positively associated with CAC (r = 0.13, p = 0.009), whereas higher maximum oxygen volumes (r = −0.19, p = 0.0002) were associated with lower CAC.

Table 1.

Firefighters’ characteristics and their univariate analyses with CAC.

Variable	Mean (SD)/n (%)	r	p-value
Age, years	49.55 (6.51)	0.28	<0.0001
Overweight, yes	152 (36.54%)	0.08	0.107
Tobacco use	37 (8.89%)	0.07	0.144
Family history	82 (19.71%)	0.07	0.168
Body mass index, kg/m²	30.15 (4.38)	0.08	0.136
LDL cholesterol (mg/dL)	120.97 (32.56)	0.07	0.184
Sodium (mmol/L)	140.19 (1.89)	−0.11	0.038
Chloride (mmol/L)	104.87 (2.27)	−0.08	0.129
Potassium (mmol/L)	4.36 (0.33)	−0.04	0.237
Total cholesterol (mg/dL)	195.36 (36.11)	0.05	0.228
GFR	96.26 (14.16)	−0.17	0.021
Glucose	99.84 (25.38)	0.13	0.001
Monocytes (%)	7.38 (1.95)	0.13	0.001
Platelet count (K/CUMM)	242.49 (57.28)	−0.08	0.120
Alkaline phosphatase (u/l)	66.86 (18.85)	0.07	0.180
Resting systolic blood pressure	126.75 (10.00)	0.13	0.009
Maximum volume of oxygen	40.02 (4.94)	−0.19	0.0002

Note: Values are presented as means (standard deviations) for continuous variables and counts (percentages) for categorical variables. The correlation coefficients (r) and p-values are derived from univariate analyses. For continuous variables, Pearson’s correlation coefficient was calculated. For binary categorical variables, point-biserial correlations were calculated.

The risk factors were ranked based on their correlation coefficients with CAC, with positive associations from the top to the bottom and inverse association from the bottom to the top, shown in Figure 1. Age, resting systolic blood pressure, and monocytes were the strongest predictors positively associated with CAC. The maximum volume of oxygen, GFR, and sodium showed the strongest inverse associations with CAC.

Figure 1.

The correlation coefficients between risk factors and CAC.

The result of comparison of average model performance results with best tunning parameters using 5-fold cross-validation (CV) in training dataset are presented in Table 2. Among the selected best fitting model with tunned parameters through 5-fold CV, XGBoost demonstrated the best predictive accuracy compared to other machine learning approaches in a comparison analysis utilizing the AUC, achieving a mean AUC of 0.808. In comparison, the RF and SVM models achieved mean AUCs of 0.783 and 0.778, respectively, while the KNN method achieved a mean AUC of 0.709.

Table 2.

The comparison of average model performance with best tunning parameters using 5-fold CV in training dataset.

Methods	Sensitivity	Specificity	Accuracy	AUC
XGBoost	0.818	0.621	0.740	0.808
RF	0.775	0.588	0.702	0.783
SVM	0.839	0.623	0.754	0.778
KNN	0.957	0.260	0.682	0.709

The comparison of model performance for NB and BLR in training dataset are presented in Table 3. The NB classifier showed an AUC of 0.805, whereas the BLR exhibited lower training predicted accuracy, with AUCs of 0.763.

Table 3.

The comparison of model performance with best performance in training dataset.

Methods	Sensitivity	Specificity	Accuracy	AUC
NB	0.823	0.580	0.729	0.805
BLR	0.861	0.500	0.721	0.763

The result of model cross-validated performance between different classifications in testing dataset are presented in Table 4. XGBoost demonstrated the highest predictive accuracy among the machine learning approaches, achieving an AUC of 0.770. The NB classifier showed an AUC of 0.768. Then, SVM and RF models achieved AUCs of 0.765 and 0.749, respectively. The KNN method and the BLR model exhibited lower predicted accuracies, with AUCs of 0.671 and 0.658, respectively.

Table 4.

The comparison of cross-validated model performance between different classifications in testing dataset.

Methods	Sensitivity	Specificity	Accuracy	AUC
XGBoost	0.750	0.571	0.691	0.770
RF	0.785	0.571	0.714	0.749
SVM	0.893	0.500	0.762	0.765
NB	0.857	0.571	0.762	0.768
KNN	0.821	0.286	0.643	0.671
BLR	0.714	0.429	0.619	0.658

Figure 2 displays the Receiver Operating Characteristic curve in the cross-validated testing dataset, illustrating the diagnostic performance of different prediction models used in our research. The graph plots sensitivity against 1- specificity for several classifiers to measure true positive rate and false positive rate. The XGBoost model’s curve, positioned at the top-left corner of the plot, demonstrates its robustness by achieving the maximum combined sensitivity and 1 - specificity, confirming the quantitative results.

Figure 2.

The cross-validated receiver operating characteristics of different machine learning models.

Figure 3 displays the variable importance plot generated by the XGBoost algorithm in the cross-validated testing dataset, ranking predictors based on their impact on the model’s performance. The predictive model demonstrated that age, alkaline phosphatase, and GFR were the most important variables. Following important features included the platelet count, resting systolic blood pressure and LDL cholesterol, highlighting their importance in evaluating the outcome variable. Variables such as glucose, monocytes, and total cholesterol contributed less to the model’s performance. The contribution of other clinical indicators, such as chloride, overweight status, and maximum volume of oxygen, was moderate. The model indicated that potassium, sodium, BMI, and tobacco use were of lower importance.

Figure 3.

The cross-validated variable importance in XGBoost algorithm.

Overall, our results showed that XGBoost achieved the best performance in these 6 prediction algorithms.

Discussion

Given the increased prevalence of CAD risk factors such as obesity, diabetes, hypertension, and aging among firefighters, CAC has been identified as a critical tool for the early detection of coronary atherosclerosis.^18,23,30 CAC is recognized as a reliable and independent predictor of atherosclerotic cardiovascular disease.¹⁶ In this study, we introduced novel classification models based on XGBoost, Random Forest, Support Vector Machine, Naïve Bayes, and K Nearest Neighbor, and identified their predictive accuracy against traditional binary logistic regression.

By incorporating clinical related factors and lifestyle risk factors, we developed ML predictive models to facilitate early prediction and detection of CAC risks among firefighters, enhancing our understanding of the complexity of CAC. Our findings indicated significant associations between higher CAC and factors such as age, resting systolic blood pressure, and monocyte levels, whereas maximum oxygen volume uptake, glomerular filtration rate, and maximum heart rate were inversely associated with CAC. All the novel ML algorithms demonstrated superior predictive accuracy compared to BLR, with XGBoost achieving the highest AUC of 0.770, slightly outperforming NB (0.768) and SVM (0.765), moderately outperforming RF (0.749), and substantially outperforming KNN (0.671), and BLR (0.658) (Table 2). Although the AUCs of XGBoost, NB, and SVM were numerically similar, we chose to emphasize XGBoost due to its consistent performance across training and testing datasets, robustness to multicollinearity, and enhanced interpretability via feature importance scores (Figure 3). Our aim was not to test statistical significance among classifiers but to identify the most clinically applicable and explainable model for integration into occupational health systems. XGBoost’s transparency in identifying key predictors supports data-driven risk stratification and underscores its utility in practical implementation, despite its modest margin in predictive performance.

Most of previous studies focused on the investigation of the risk factors and the prediction models of CVD and CAC within the general population.^14,23,31,32 Research specific to firefighters has been limited and often centered on general cardiovascular risk factors rather than predictive modeling for CAC. For instance, prior studies have employed ML methods like Random Forest to estimate the likelihood of cardiovascular disease in firefighters (Jones et al., 2020).³³ However, our study incorporates a comprehensive collection of predictors that were not utilized by these previous investigations, with a specific focus on CAC. Traditional statistical models have been employed in previous research. One common method is logistic regression; Farjo et al. (2020) applied it to predict CAC (0 vs >400) in the general population.¹⁴ Logistic regression considers interaction effects only if interaction terms are explicitly specified in the model (usually only 2-way interactions, if at all, depending on interest of the investigator) and is frequently employed to calculate odds ratios for the association between different only the main effects of the risk factors and health outcomes. While logistic regression models are valuable for assessing the interaction effects and calculating odds ratios, they may not capture or discover the complex patterns and interactions present in datasets with diverse risk variables.

The clinical utility of prediction models in preventive cardiology has gained increasing attention, particularly as artificial intelligence (AI) methods continue to evolve. In recent studies, ML-based approaches have shown promise in improving the detection and prognostication of cardiovascular conditions. For example, contrast-free cardiovascular magnetic resonance imaging combined with AI was used for accurate identification of myocardial infarction, demonstrating the expanding role of AI in imaging-based risk prediction.³⁴ In another systematic review, machine learning techniques applied to SPECT data significantly improved prognostic predictions in coronary artery disease.³⁵ Our findings further contribute to this growing evidence base by demonstrating that ML models—particularly XGBoost, NB, and SVM substantially outperform traditional binary logistic regression in predicting the presence of coronary artery calcium (CAC) using readily available clinical and lifestyle data.

Importantly, these models are not just algorithmic exercises—they have direct relevance to digital health implementation. For example, the XGBoost model can be embedded into electronic health record (EHR) systems as a real-time risk calculator to flag firefighters at elevated risk for subclinical coronary atherosclerosis. This model could also be integrated into mobile health (mHealth) platforms or occupational health dashboards to deliver personalized risk profiles and enable proactive decision-making. Such integration enhances routine health surveillance by automating CAC risk prediction based solely on structured EHR data, eliminating reliance on costly imaging in settings where it may not be feasible.

This informatics-driven approach is particularly valuable in occupational settings like firefighting, where physical exertion, exposure to toxins, and shift work all contribute to elevated cardiovascular risk. Embedding these ML tools into digital health infrastructure supports scalable, personalized screening workflows and aligns with current efforts to modernize occupational medicine through health informatics. Ultimately, this application has the potential to reduce cardiovascular events, improve operational readiness, and enhance long-term health outcomes in this high-risk workforce.

Our study’s achievement of an AUC of 0.770 underscores the superior pattern recognition capabilities of XGBoost (and cross-validation performance was nearly as good for NB and SVM), particularly in handling complex relationships and many predictors. This research is pioneering in using multiple machine learning algorithms, including XGBoost, for CAC prediction among firefighters, highlighting the potential of ML applications for health risk assessment in this high-risk group. This innovation is critical given the specific hazards firefighters face and the need for individualized predictive models that accurately identify CAC risks to inform and drive the implementation of targeted preventive strategies.

Strength

Our study has several strengths. The incorporation of detailed clinical related factors and lifestyle risk factors enables a comprehensive analysis of CAC risk factors specific to firefighters. Using ML’s predictive models, particularly XGBoost (and secondarily NB and SVM), not only improved the accuracy of CAC predictions but also uncovered complex data patterns that can inform both future research and preventative strategies. Moreover, our research emphasizes the transformative potential of machine learning in the field of preventive healthcare. By leveraging our models for early detection and timely intervention, we anticipate a significant reduction in the incidence and consequences of CAD among firefighters, a particularly vulnerable workforce. Early identification of at-risk individuals through innovative ML applications can enable prompt intervention strategies, potentially preventing the progression of CAD.

Limitations

Our study has several limitations. The cross-sectional study limited our ability to explore the temporal relationships between risk factors and the development of CAC. Future longitudinal studies are needed to validate the predictive models and explore the temporal associations between risk factors and CAC development, enhancing an understanding of CAC risk factors among firefighters in a more dynamic manner. Moreover, our study populations are primary focused on whites and males in a specific occupation, limiting the generalizability of our findings. Additionally, this study did not include a sample size calculation, as the analysis was based on existing clinical data. This limits our ability to assess whether the sample size was adequately powered for all predictors included in the model.

Conclusions

In conclusion, our research contributes important insights into the predictive modeling of CAC among firefighters, emphasizing the value of ML algorithms in enhancing early detection and preventive strategies. Such advancements in predictive analytics are instrumental for the proactive health management of firefighters, potentially mitigating the risks associated with their demanding and hazardous profession. The utilization and value of ML algorithms, particularly XGBoost, emphasize the potential for advancement in healthcare and highlight the necessity of focused preventive strategies for high-risk occupational populations.^18,36 Future studies should explore additional machine learning algorithms and integrate new biomarkers and imaging techniques to refine CAC predictions and preventive strategies for firefighters.^18,36–39

Clinical perspectives

Competency in medical knowledge

This study advances predictive modeling of CAC scores in firefighters by comparing multiple ML algorithms, demonstrating their superiority over traditional binary logistic regression. Significant associations were found between CAC scores and factors such as age, blood pressure, and monocyte levels, while inverse associations were noted with maximum oxygen uptake, glomerular filtration rate, and heart rate. Among all models tested, XGBoost achieved the highest area under the curve (AUC), underscoring its utility in identifying subclinical cardiovascular risk. These findings support the potential of ML models to deliver individualized, data-driven cardiovascular risk assessments and inform targeted preventive strategies in high-risk occupational populations.

Translational outlook

This study highlights the translational potential of ML-based models, particularly XGBoost, in improving early detection and preventive management of CAC in firefighters. These algorithms can be integrated into EHR systems or occupational health informatics platforms to enable real-time, automated risk stratification without the need for immediate imaging. By leveraging routinely collected health data, such models support more personalized and proactive clinical decision-making, particularly in settings where cardiovascular screening resources are limited. Future research should validate these models in longitudinal settings, incorporate additional biomarkers or imaging modalities, and explore deployment within mHealth tools to broaden their impact across digital health ecosystems.

Footnotes

Acknowledgements

We sincerely appreciate firefighters’ participation to this study.

ORCID iDs

Mingyue Li

Laura Y Zhou

Ethical approval

Our study only involved secondary data analysis, so participants were not recruited directly. Our study was approved by Indiana University Office of Research Compliance, under the protocol number 20510 and the Ascension Health Institutional Review Board, under the protocol number: RIN20240002.

Informed consent

Written informed consent was obtained from individual participants included in the study. The data collection and analysis adhered to the confidentiality and privacy standards.

Author Contributions

All authors had responsibility for the data management and statistical analysis. NH and ML was responsible for design for the study concept. ML analyzed the data and developed the machine learning algorithms. ML drafted the manuscript. All authors (ML, JH, CM, YX, TZ, LZ, PM, JW, VK, SM, HN) made significant contributions to the analysis of data and review of the manuscript. All authors had responsibility for the data’s integrity and the precision of analyses.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data utilized in this article cannot be shared publicly due to privacy concerns for the individuals who participated in the study. The study data are available upon reasonable request. For inquiries regarding the data or the study, please contacting the corresponding author, Mingyue Li, at mingyue.li@utsouthwestern.edu.

Patient and public involvement

This study used secondary data analysis, so patient and public were not directly involved in this study. Therefore, no participants were specifically recruited for this study. Our study was approved by Indiana University Office of Research Compliance, as evidenced by the allocated protocol number 20510, and Ascension Health Institutional Review Board, as evidenced by RIN20240002.

Appendix

References

Esteves

Slezakova

Madureira

, et al. Firefighters’ occupational exposure in preparation for wildfire season: addressing biological impact. Toxics 2024; 12(3): 201.

Dzikowicz

Al-Zaiti

Carey

. Cardiovascular function and deleterious adaptations among firefighters: Implications for Smart firefighting. Intelligent building fire safety and smart firefighting. Springer, 2024, pp. 455–473.

Crawford

Graveling

. Non-cancer occupational health risks in firefighters. Occup Med 2012; 62(7): 485–495.

Heydari

Ostadtaghizadeh

Ardalan

, et al. Exploring the criteria and factors affecting firefighters’ resilience: a qualitative study. Chin J Traumatol (Engl Ed) 2022; 25(02): 107–114.

Hollerbach

Mathias

Stewart

, et al. A cross-sectional examination of 10-year atherosclerotic cardiovascular disease risk among US firefighters by age and weight status. J Occup Environ Med 2020; 62(12): 1063–1068.

Laukkanen

Savonen

Hupin

, et al. Cardiorespiratory optimal point during exercise testing and sudden cardiac death: a prospective cohort study. Prog Cardiovasc Dis 2021; 68: 12–18.

Gonzalez

Lanham

Martin

, et al. (eds). Firefighter Health: A Narrative Review of Occupational Threats and Countermeasures. Healthcare. MDPI, 2024.

Barros

Oliveira

Morais

. Firefighters’ occupational exposure: contribution from biomarkers of effect to assess health risks. Environ Int 2021; 156: 106704.

Igboanugo

Bigelow

Mielke

. Health outcomes of psychosocial stress within firefighters: a systematic review of the research landscape. J Occup Health 2021; 63(1): e12219.

10.

McDonough

Phillips

Twilbeck

. Determining best practices to reduce occupational health risks in firefighters. J Strength Condit Res 2015; 29(7): 2041–2044.

11.

McQuerry

Kwon

Poley-Bogan

. Female firefighters’ increased risk of occupational exposure due to ill-fitting personal protective clothing. Front Mater 2023; 10: 1175559.

12.

Noh

Lee

Jamrasi

, et al. Physical fitness levels of South Korean national male and female firefighters. J Exerc Sci Fit 2020; 18(3): 109–114.

13.

Beroukhim Afrahimi

Kinninger

, et al. establishing a coronary calcium scoring threshold as a gateway to invasive testing for firefighters undergoing fitness for duty exams. Circulation 2021; 144(Suppl_1): A9412–A9413.

14.

Farjo

Yanamala

Kagiyama

, et al. Prediction of coronary artery calcium scoring from surface electrocardiogram in atherosclerotic cardiovascular disease: a pilot study. Eur Heart J Digit Health 2020; 1(1): 51–61.

15.

Cheong

Wilson

Spann

, et al. Coronary artery calcium scoring: an evidence‐based guide for primary care physicians. J Intern Med 2021; 289(3): 309–324.

16.

Bell

White

Hassan

, et al. Evaluation of the incremental value of a coronary artery calcium score beyond traditional cardiovascular risk assessment: a systematic review and meta-analysis. JAMA Intern Med 2022; 182(6): 634–642.

17.

Rajpurkar

Chen

Banerjee

, et al. AI in health and medicine. Nat Med 2022; 28(1): 31–38.

18.

Zhu

Yin

Schoepf

, et al. Machine learning for the prevalence and severity of coronary artery calcification in nondialysis chronic kidney disease patients: a Chinese large cohort study. J Thorac Imag 2022; 37(6): 401–408.

19.

Johnson

Wei

Weeraratne

, et al. Precision medicine, AI, and the future of personalized health care. Clin Transl Sci 2021; 14(1): 86–93.

20.

Kulkarni

Seneviratne

Baig

, et al.

Artificial intelligence in medicine: where are we now?

Acad Radiol 2020; 27(1): 62–70.

21.

Kelly

Karthikesalingam

Suleyman

, et al. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019; 17: 195–199.

22.

Gruson

Bernardini

Dabla

, et al. Collaborative AI and laboratory medicine integration in precision cardiovascular medicine. Clin Chim Acta 2020; 509: 67–71.

23.

Al’Aref

Maliakal

Singh

, et al. Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry. Eur Heart J 2020; 41(3): 359–367.

24.

Haug

Drazen

. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med 2023; 388(13): 1201–1208.

25.

Cutillo

Sharma

Foschini

, et al. Machine intelligence in healthcare—perspectives on trustworthiness, explainability, usability, and transparency. npj Digit Med 2020; 3(1): 47.

26.

Ngiam

Khor

. Big data and machine learning algorithms for health-care delivery. Lancet Oncol 2019; 20(5): e262–e273.

27.

Scott

Carter

Coiera

. Clinician checklist for assessing suitability of machine learning applications in healthcare. BMJ Health Care Inform 2021; 28(1): e100251.

28.

Tonekaboni

Joshi

McCradden

, et al. (eds). What clinicians want: contextualizing explainable machine learning for clinical end use. Machine learning for healthcare conference. PMLR, 2019.

29.

Hoff

Chomka

Krainik

, et al. Age and gender distributions of coronary artery calcium detected by electron beam tomography in 35,246 adults. Am J Cardiol 2001; 87(12): 1335–1339.

30.

Kaul

Enslin

Gross

. History of artificial intelligence in medicine. Gastrointest Endosc 2020; 92(4): 807–812.

31.

Han

Kang

K-W

Kim

, et al. Artificial intelligence-enabled ECG algorithm for the prediction of coronary artery calcification. Front Cardiovasc Med 2022; 9: 849223.

32.

Huang

Ren

Yang

, et al. Using a machine learning-based risk prediction model to analyze the coronary artery calcification score and predict coronary heart disease and risk assessment. Comput Biol Med 2022; 151: 106297.

33.

Jones

Connors

Dunn

, et al. Bacterial taxa and functions are predictive of sustained remission following exclusive enteral nutrition in pediatric Crohn’s disease. Inflamm Bowel Dis 2020; 26(7): 1026–1037.

34.

Cicek

Bagci

. AI-powered contrast-free cardiovascular magnetic resonance imaging for myocardial infarction. Front Cardiovasc Med 2024; 11: 1457498.

35.

Cicek

Cikirikci

EHK

Babaoğlu

, et al. Machine learning for prognostic prediction in coronary artery disease with SPECT data: a systematic review and meta-analysis. EJNMMI Res 2024; 14(1): 117.

36.

van der Aalst

Denissen

Vonder

, et al. Screening for cardiovascular disease risk using traditional risk factor assessment or coronary artery calcium scoring: the ROBINSCA trial. Eur Heart J Cardiovasc Imaging 2020; 21(11): 1216–1224.

37.

Panayides

Amini

Filipovic

, et al. AI in medical imaging informatics: current challenges and future directions. IEEE J Biomed Health Inform 2020; 24(7): 1837–1857.

38.

Martin

van Assen

Rapaka

, et al. Evaluation of a deep learning–based automated CT coronary artery calcium scoring algorithm. Cardiovascular Imaging 2020; 13(2_Part_1): 524–526.

39.

Adelhoefer

Uddin

SMI

Osei

, et al. Coronary artery calcium scoring: new insights into clinical interpretation—lessons from the CAC Consortium. Radiol Cardiothorac Imaging 2020; 2(6): e200281.