Abstract
Determining the factors that contribute to making a reliable prediction of the metabolic syndrome will provide a deeper understanding of the medical indices involved in the prediction and assist in early diagnosis and treatment of patients. The study examined the optimal number of National cholesterol education program adult treatment panel (NCEP ATP) III indices needed to make a reliable prediction of the syndrome, whether each of the five NCEP ATP III indices for predicting the syndrome is equally important and whether a reliable prediction can be made using calculated blood pressure indices – estimated mean arterial pressure and pulse pressure – instead of NCEP ATP III blood pressure indices. The results show that NCEP ATP III indices for determination of the syndrome are not equally important. Moreover, the indices importance and their prediction quality vary according to gender. Optimal results are obtained by using all five NCEP ATP III indices for prediction.
Keywords
Introduction
Metabolic syndrome is among the most common public health problems worldwide and a major contributor to cardiovascular disease and diabetes. 1 The syndrome is a prominent risk factor for minor ischemic stroke and patients that have suffered a minor ischemic stroke and are diagnosed with metabolic syndrome are at a high risk of subsequent vascular events. 2 Metabolic syndrome is also associated with an increased risk of cardiovascular disease and mortality. According to the results of the National Health and Nutrition Examination Survey (NHANES) conducted in 2011–2016 among adults aged 20 and over in the US, the incidence of metabolic syndrome, as defined by the National Cholesterol Education Program (NCEP) ATP (Adult Treatment Panel) III, grows significantly with age. No significant differences in the prevalence of metabolic syndrome between men and women in each age group 1 were found. 3 Studies in nursing homes in Europe found that prevalence of metabolic syndrome was high among the elderly population and increased with age. In these studies, gender, in contrast, was seen to play a major role in the prevalence prediction of metabolic syndrome.4–6 Women apparently become more susceptible to metabolic syndrome with aging, thus separate guidelines may be required for men and women. 7 The prevalence exhibited a commensurate rise with age, as suggested in 8 and was significantly higher among more elderly people in the overall sample and among women. For men, however, the prevalence rate trended toward a non-significant pattern with increasing age. 9
The National Institutes of Health’s (NIH) definition of the NCEP ATP III is the one most used and widely accepted by the international community and the WHO. According to this definition, the presence of metabolic syndrome is confirmed when at least three of the following five indicators are present: (i) Waist circumference is at least 102 cm in men and at least 88 cm in women (ii) Blood triglycerides level is higher than 150 mg/dL (iii) HDL cholesterol (HDL-C) level in the blood is lower than 40 mg/dL in men and 50 mg/dL in women (iv) Blood pressure is at least 130/85 mm Hg (v) Fasting glucose blood level is higher than 100 mg/dL.
10
Machine learning algorithms have been shown to have the potential to significantly help solve health problems by developing classification systems that can assist physicians diagnose and predict disease onset in early stages. Extracting knowledge from medical data, however, is challenging because these data may be heterogeneous, disorganized, high-dimensional, and contain “noise” and anomalies. 11 Today, clinicians and researchers are able to utilize machine learning to positively impact patient outcomes in a clinically meaningful way. 12
The prevalence of metabolic syndrome as a worldwide public health problem with high morbidity, mortality, and costs highlights the urgency of efforts to identify and modify risk factors for the syndrome. 10 Improving its prognosis through machine learning will improve the quality of life and even survival of men and women. Determining the main factors that contribute to the prediction of the syndrome will provide a deeper understanding of existing medical indices (NCEP ATP III and others) and assist in the implementation of early treatment of patients.
Our work, using machine learning, examines the primary tools used for predicting the syndrome. We seek to determine the optimal number of NCEP ATP III indices needed to make a reliable prediction, and whether each of the five NCEP ATP III indices used to predict the syndrome is equally important. In case that the indices are not equally important, we aim to discover their order of importance. We further intend to find out whether it should be beneficial to use calculated blood pressure indices instead of NCEP ATP III blood pressure indices (systolic and diastolic blood pressure) for syndrome prediction and diagnosis. Finally, we investigate the effect, if any, of gender on the indices’ importance and prediction value. To the best of our knowledge, we are the first researchers seeking to elucidate the above issues.
Our work emphasizes the importance of the order and the contribution of each NCEP ATP III index to the metabolic syndrome prediction, according to gender and the need for at least four variables for a sufficient prediction.
Methods
In this study we analyzed a large dataset comprising the medical records compiled over time of a Spanish sample that underwent periodic health examinations at their workplace. Based on the dataset, we built and examined three datasets: a dataset for men and women, a dataset for men only and a dataset for women only. Using the medical indices in the datasets, we derived from them additional, calculated medical indices that are widely used in medical practices. We sought to determine which of these indices were most important for predicting metabolic syndrome for three different sets of groups (men and women, men only, and women only) 13 using the ExtraTreesClassifier 14 and univariate selection 15 methods. The importance of a feature refers to its contribution to higher chances of having metabolic syndrome. That is, as the importance of a feature increases, its contribution to the chance of having metabolic syndrome increases too. We predicted metabolic syndrome based on indices selected in the order of their importance ranking according to the ExtraTreesClassifier method.
Random forest, 16 Naive Bayes, 17 k-nearest neighbor (KNN), 18 decision tree (CART) 19 and logistic regression 20 algorithms were applied and examined to select a classification algorithm for predicting metabolic syndrome:
In this examination, we found out that the use of a Random Forest algorithm with a Gini impurity criterion has led to a more effective partitioning of the data. Further, the inclusion of a larger number of trees in the forest (n_estimators) did not significantly increase the computational complexity required to generate a prediction output, nor did it noticeably augment the bias of the model. K-nearest neighbors (KNN) with uniform weights reduce variance although at the cost of larger bias. Decision tree (CART) with the criterion of Gini impurity makes a purer separation and the “best” splitter chooses the best split. Logistic regression with the ‘liblinear’ solver performs well with high dimensionality. For each prediction made, five algorithms were examined, and the prediction was evaluated according to four performance measurements: sensitivity, precision, F1-score, and MCC (Matthews correlation coefficient). From among these, we identified the classification algorithm with the highest F1-score performance measurement and selected it for making further predictions. Using this algorithm, we predicted metabolic syndrome using four and five NCEP ATP III criteria. We additionally examined the contribution of blood pressure indices to the prediction of metabolic syndrome. F1-score performance measurement was chosen because it balanced the sensitivity and precision performance indices. We also examined the prediction of the syndrome using a neural network algorithm. The performance measurements, however, were lower than the performance measurements of the other algorithms, unlike high performance measurements in.21,22
Our dataset is an open-source free dataset in accordance with the Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license. Participants of the research in which that data was collected provided their written informed consents to participate. The protocol of that study complied with the Declaration of Helsinki for conducting medical research involving human subjects, authorized by Mallorca Health Management Ethical Review Committee of GESMA.
Study population
A free-to-use medical dataset was selected. The dataset contained records of 60,799 subjects who underwent periodic health examinations at their workplace in Spain between 2012–2016. 23 Medical dataset indices included personal information and health habits (age, gender and smoking), anthropometric measurements (body fat percentage, ABSI index, BMI, waist circumference and waist circumference-to-height ratio), systolic and diastolic blood pressure and blood measurements (total cholesterol, LDL cholesterol, HDL-C, fasting glucose and triglycerides) and the classification of whether or not the individual had metabolic syndrome. 14 Based on the dataset, we built and examined three datasets: a dataset for men and women, a dataset for men only and a dataset for women only. Calculated medical indices were added to each dataset (pulse pressure, estimated mean arterial pressure, HDL cholesterol to blood triglyceride level, non-HDL-C, and total cholesterol to blood triglyceride level).
Measurement
A confusion matrix was calculated based on subjects’ metabolic syndrome classification in the datasets and performance measurements were calculated (accuracy, sensitivity, precision, F1-score, and MCC). The prevalence of the metabolic syndrome in the study population (9%) - in a young working population (average age 40 and standard deviation 10.3), is neither low nor high, and therefore a threshold of 0.5 is appropriate.
Data analysis
Feature importance of the indices used in the datasets
Using the ExtraTreesClassifier and univariate selection methods, we determined which indices were most important. We applied these methods to all three datasets for examining gender effects. The preferred classification algorithm for predicting metabolic syndrome per subgroup was selected based on the highest F1-score performance measure. Therefore, boxplots were prepared for each dataset for the classification algorithms we used: random forest, naïve Bayes, decision tree (CART), KNN and logistic regression. As mentioned above, for evaluation of the quality of metabolic syndrome prediction, performance measures were calculated: accuracy, sensitivity, precision, F1-score and MCC.
Improving metabolic syndrome prediction using four and five national cholesterol education program adult treatment panel III criteria
Determining metabolic syndrome as defined by NCEP ATP III requires using at least three of five criteria (blood triglyceride level, waist circumference, HDL-C, blood glucose level, and blood pressure). We evaluated the predictive quality of the syndrome using three, four and five indices according to NCEP ATP III, with respect to sensitivity, precision, F1-score and MCC.
Combinations of three and four indices using the men and women dataset were selected according to their importance ranking by the ExtraTreesClassifier method. The following are the combinations we used (Figure 1): Three most important indices: blood triglyceride level, waist circumference and HDL-C. Four most important indices: blood triglyceride level, waist circumference, HDL-C and blood glucose level. Five most important indices: blood triglyceride level, waist circumference, HDL-C, blood glucose level and blood pressure. Feature importance ranking of the men and women dataset indices.

A specific algorithm was selected according to the highest F1-score performance measurement: For the three indices: the Naive Bayes algorithm was selected for the men and women dataset, the random forest algorithm was selected for the men dataset and the KNN algorithm was selected for the women dataset. Four and five indices: the random forest algorithm was selected.
Contribution of blood pressure indices to the prediction of metabolic syndrome
We predicted metabolic syndrome according to blood pressure indices (systolic and diastolic blood pressure from the medical dataset; and additional calculated indices – estimated mean arterial pressure and pulse pressure), in two index groupings ranked by importance using the ExtraTreesClassifier method. The aim was to find the contribution of calculated blood pressure indices to the prediction of metabolic syndrome. The two groups for which metabolic syndrome was predicted using NCEP ATP III blood pressure indices and calculated blood pressure indices were: Index Group 1: blood triglyceride level, blood glucose level, waist circumference, HDL-C (NCEP ATP III blood pressure indices; hereinafter referred to as Group 1 base indices) and estimated mean arterial pressure and pulse pressure. Index Group 2: blood triglyceride level, blood glucose level, non-HDL-C (NCEP ATP III blood pressure indices; hereinafter referred to as Group 2 base indices) and estimated mean arterial pressure and pulse pressure.
For Groups 1 and 2, the chosen algorithm was random forest for the men and women dataset, for the men dataset and for the women dataset. Performance measurements of Groups 1 and 2 and estimated arterial pressure and pulse pressure were compared to performance measurements of Groups 1 and 2 and systolic blood pressure and diastolic blood pressure.
Results
Feature importance of the dataset’s indices
We examined which indices are most important as ranked by the ExtraTreesClassifier and univariate selection methods for men and women, for men only, and for women only.
Feature importance of indices using the ExtraTreesClassifier method
Abbreviations: SBP – Systolic Blood Pressure; DBP – Diastolic Blood Pressure; PP – Pulse Pressure; MAP – Mean Arterial Pressure = DBP + 1/3 (SBP – DBP); ABSI index – WC/((BMI)2/3 (height)1/2); BF – %Body Fat =1.2 × (BMI) + 0.23 × (age in years) − 10.8 × (gender) – 5.4. Gender: women (0), men (1); BMI – body weight (kg) divided by height (m) squared, in kg/m2
Figure 1 shows the feature importance ranking of the men and women dataset indices. We observed that the five NCEP ATP III indices ranked from high to low were: blood triglyceride level, waist circumference, HDL-C, blood glucose level and blood pressure.
Figure 2 shows the feature importance ranking of the men only dataset indices. We observe that the five NCEP ATP III indices ranked from high to low are: blood triglyceride level, waist circumference, HDL-C, blood glucose level and blood pressure. Feature importance ranking of the men only dataset indices.
Figure 3 shows the feature importance ranking of the women only dataset indices. We observe that the five NCEP ATP III indices ranked from high to low are: blood triglyceride level, blood glucose level, waist circumference, blood pressure and HDL-C. Feature importance ranking of the women only dataset indices.
Feature importance of the three dataset indices using the univariate selection (chi-square) method
Importance ranking of the men and women dataset indices, men dataset indices and women dataset indices.
We found that the most important and leading index according to both methods for the three datasets was the NCEP ATP III triglyceride blood level index. Asides from this index, the importance ranking generated by the ExtraTreesClassifier and univariate selection (chi-square) methods of the other indices are different for the three datasets. The HDL-C index was ranked the lowest by the univariate selection (chi-square) method, and the diastolic blood pressure index was ranked lowest by the ExtraTreesClassifier. Of the calculated blood pressure indices, pulse pressure was the least important measure in both methods.
Improving metabolic syndrome prediction by using four and five national cholesterol education program adult treatment panel III indices
By definition, three of the five NCEP ATP III metrics are sufficient to determine metabolic syndrome. Our study shows that adding a fourth or fifth NCEP ATP III index to predict the syndrome, instead of just three, improves the quality of prediction of metabolic syndrome, noticeably for women.
Prediction quality according to using various numbers of indices in the men and women dataset, men dataset and women dataset.
prediction’s confusion matrixes according to using various numbers of indices in the men and women dataset, men dataset women dataset.
Contribution of blood pressure indices to metabolic syndrome prediction
Group 1
Group 1 performance measurements in the men and women dataset.
Group 1 confusion matrixes in the men and women dataset.
Group 2
Group 2 performance measurements in the men and women dataset.
Group 2 confusion matrixes in the men and women dataset.
Discussion
Feature importance of the datasets’ indices
The study showed that the ExtraTreesClassifier and univariate selection (chi-square) methods rank importance of indices differently for men and women when taken as one group, and men and women viewed as separate groups. The assorted NCEP ATP III criteria were not equally important when making a prediction. In both methods, the most highly ranked NCEP ATP III criterion for both men and women as a group and men and women separately was blood triglyceride level, such as.24,25 Verses blood triglyceride level in men and waist-to-height ratio in women that showed the strongest predictive strength for the syndrome 26 ; of the remaining criteria, waist circumference for all datasets was in the top five. Waist-to-height ratio index was useful in identification of metabolic syndrome27,28 and had important diagnostic value for metabolic syndrome in older adults.29–31 Of the top five indices ranked important in both methods, two indices appear on both methods’ lists (blood triglyceride level, waist circumference), three appear on only one list and another three appear only on the other list.
Contribution of blood pressure indices to metabolic syndrome prediction
To the best of our knowledge, our study is the first to examine calculated blood pressure measures – estimated mean arterial pressure and pulse pressure – when predicting metabolic syndrome using machine learning. According to the study findings, predictions using the NCEP ATP III blood pressure indices versus estimated mean arterial blood pressure and pulse pressure index are more accurate. Further, when using estimated mean arterial pressure versus pulse pressure the prediction of metabolic syndrome is more accurate.
The fact that the dataset was homogeneous (residents of Spain, all of whom were employed) constitutes a limitation of the research. Therefore, it is recommended that a future study be conducted that examines the prediction and importance of NCEP ATP III criteria and other indices for the prediction of metabolic syndrome in other populations based on additional datasets.
The results of the study can be used with clinical informatics considerations, to assist in making monitoring and treatment decisions for patients through digital health systems. Digital Health interventions can be beneficial for improving systolic blood pressure and anthropometric outcomes like waist circumference. 32 Automatized follow-up and alert systems provide support for the control and reduction of diseases associated with high blood pressure. Integration of new physiological variables for monitoring is feasible, and will broaden the scope for the early detection of chronic diseases including those associated with the metabolic syndrome, which could result in a reduction in their frequency. 33 The use of E-Health apps in most cases is voluntary, but persistence to goals can set the stage for long-term use, thereby promoting more favorable health outcomes. 34 The use of digital health-based lifestyle interventions may increase engagement and persistence in self-healthcare, 35 which is essential in the case of the syndrome.
Conclusions
In this study, methods for predicting the syndrome through machine learning were examined. The study, using data drawn from a Spanish sample, examined the optimal number of NCEP ATP III indices needed to make a reliable prediction of the syndrome, whether each of the five NCEP ATP III indices for predicting the syndrome are equally important, and if not, what is the order of their importance, whether a reliable prediction can be made using calculated blood pressure indices – estimated mean arterial pressure and pulse pressure – instead of NCEP ATP III blood pressure indices, and the effect of gender on the indices’ importance and the prediction.
According to the NCEP ATP III definition, at least three criteria are to be selected, with the assumption that all five indices for determination of the syndrome are of equal importance. In this study we show that this may not be the case. Moreover, the indices importance and their prediction quality included NCEP ATP III indices and other personal information, health habits, anthropometric measurements and blood measurements vary according to gender. Optimal results are obtained by using all five indices for prediction, and even when using just four instead of three NCEP ATP III indices, good prediction can be attained.
For predicting metabolic syndrome, NCEP ATP III (systolic and diastolic blood pressure) indices are better than the calculated indices (estimated mean arterial blood pressure and pulse pressure), and an estimated mean arterial pressure index is better than a pulse pressure index.
Future research could deepen the understanding of gender and other demographic data such as age for the purpose of accurate and focused prediction of metabolic syndrome. It could further study different ethnic groups as well as working and non-working subjects, in the aim of improving medical care. SHAP chart could deepen the understanding of the positive and negative contribution of each feature to metabolic syndrome prediction. Analyzing the data and predicting the syndrome for specific groups of medical and other indices, and addressing the difficulty and costs of producing each index or set of indices separately, may improve the cost–benefit ratio of diagnosing metabolic syndrome. Predicting the syndrome and investing in prevention efforts among high-risk populations could reduce the risk of mortality, morbidity and decline in quality of life.
Footnotes
Author’s note
Our dataset is an open-source free dataset in accordance with the Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license. Participants of the research in which that data was collected provided their written informed consents to participate. The protocol of that study complied with the Declaration of Helsinki for conducting medical research involving human subjects, authorized by Mallorca Health Management Ethical Review Committee of GESMA.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
