Comparing the accuracy of four machine learning models in predicting type 2 diabetes onset within the Chinese population: a retrospective study

Abstract

Objective

To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records.

Methods

We included 46,247 patients (32,372 and 13,875 in training and validation sets, respectively) from a national health checkup center database. Univariate and multivariate Cox analyses were performed to identify factors influencing T2DM risk. Extreme Gradient Boosting (XGBoost), support vector machine (SVM), logistic regression (LR), and random forest (RF) models were trained to predict 5-year T2DM risk. Model performances were analyzed using receiver operating characteristic (ROC) curves for discrimination and calibration plots for prediction accuracy.

Results

Key variables included fasting plasma glucose, age, and sedentary time. The LR model showed good accuracy with respective areas under the ROC (AUCs) of 0.914 and 0.913 in training and validation sets; the RF model exhibited favorable AUCs of 0.998 and 0.838. In calibration analysis, the LR model displayed good fit for low-risk patients; the RF model exhibited satisfactory fit for low- and high-risk patients.

Conclusions

LR and RF models can effectively predict T2DM risk in the Chinese population. These models may help identify high-risk patients and guide interventions to prevent complications and disabilities.

Keywords

Machine learning prediction model type 2 diabetes mellitus sedentary time Chinese population logistic regression random forest XGBoost support vector machine fasting plasma glucose

Introduction

Diabetes mellitus is a metabolic syndrome characterized by elevated blood glucose levels due to impaired glucose regulation. Type 2 diabetes mellitus (T2DM), the most common form of diabetes, represents 90% of cases worldwide.¹ T2DM is a chronic disease influenced by genetic and environmental factors.² Long-term uncontrolled T2DM can lead to complications in multiple organ systems. T2DM-related cardiovascular and renal diseases are leading causes of morbidity; the resulting disability and mortality constitute substantial economic burdens on families and society.³

During rapid economic growth in the past three decades, diabetes prevalence has increased by tenfold in mainland China.⁴ In 2013, an estimated 113.9 million Chinese adults had diabetes, whereas 493.4 million had prediabetes; these prevalences are increasing.⁵ The number of people with T2DM risk is increasing due to population aging, which presents a challenge for Chinese healthcare providers.⁶ To improve T2DM awareness and early intervention efforts, the Chinese Diabetes Society revised its diagnostic criteria in 2020. However, numerous individuals with T2DM are asymptomatic or lack access to routine diabetes screening; thus, 8.1% of the Chinese population with diabetes remains undiagnosed.⁵ Although the economic burden remains an important aspect of diabetes, it is important to emphasize the potential for prognostic models to transform healthcare.

Machine learning (ML) has been increasingly used to predict non-communicable disease risk since the widespread emergence of artificial intelligence in the 2010s. This alternative approach to diabetes screening, diagnosis, and risk prediction has been reported elsewhere.⁷ The establishment of prediction models via ML can help doctors and patients better understand disease progression, enabling appropriate preventive and therapeutic measures. The predictive abilities of ML have demonstrated accuracy and flexibility regarding T2DM.⁸ However, some challenges persist in this research field, such as dataset quality and quantity, model selection and optimization, and other factors. Several ML models can help predict T2DM risk, but their reliability requires validation due to small training and testing datasets.⁹ Some researchers have suggested that the performances of ML models for predicting diabetes risk are not superior to conventional risk stratification models,¹⁰ primarily because of sample size. The previous models have some additional limitations: inconsistent performance depending on input variables and uncertain reproducibility among ethnicities and populations. Because model construction requires substantial time and resources, these limitations should be considered when developing and evaluating prediction models. Recent studies have revealed challenges regarding generalizability and performance across populations and sub-groups.^9,11 Although ML has broad potential for predicting T2DM risk, some persistent issues must be addressed. We sought to improve the application of ML in predicting T2DM risk by using a larger dataset and testing the discrimination and calibration performances of four ML models (Extreme Gradient Boosting [XGBoost], support vector machine [SVM], logistic regression [LR], and random forest [RF]).

Materials and methods

Study population and data acquisition

Clinical data were collected from an open-access database of basic demographic information, past medical history, and laboratory results from 2010 to 2016, maintained by the Rich Healthcare Group.¹² These data included age, sex, body mass index (BMI), systolic and diastolic blood pressures (SBP and DBP), fasting plasma glucose (FPG), total cholesterol (TC), triglycerides (TG), high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, alanine transaminase (ALT), aspartate aminotransferase (AST), blood urea nitrogen (BUN), serum creatinine (CR), and family history of T2DM. Family history of T2DM was defined as a diagnosis of T2DM in parents, siblings, and/or offspring at the time of data collection. Individuals who met the following Chinese Diabetes Society 2020 diagnostic criteria for diabetes¹³ at the initial checkup were excluded: typical diabetic symptoms and either random plasma glucose ≥11.1 mmol/L, FPG ≥7.0 mmol/L, 2-hour oral glucose tolerance test ≥11.1 mmol/L, or glycated hemoglobin (HbA1C) ≥6.5%. Asymptomatic individuals were asked to undergo repeat testing confirmation on another day. Individuals were excluded if any follow-up data were missing within 5 years after the initial evaluation. Diagnoses of T2DM were made according to the above criteria or as recorded in the database. This study was conducted in accordance with the 2013 revision of the Helsinki Declaration. This report was written in compliance with STROBE guidelines.¹⁴ All patient information was deidentified to ensure privacy. This retrospective study did not require formal Institutional Review Board approval because it constituted a secondary analysis of a public dataset; for the same reason, written participant consent was not required.

Selection of prognostic variables

We carefully selected 15 variables (age, sex, BMI, SBP, DBP, FPG, TC, TG, HDL cholesterol, LDL cholesterol, ALT, AST, BUN, CR, and family history of T2DM) strongly associated with T2DM risk to preserve model discrimination capacity and avoid redundancy during ML model establishment. Because the incidence of T2DM tends to increase over time, survival analysis (i.e., Cox proportional hazards regression) was used to assess time-to-event outcomes. Variable importances were ranked using a tree-based technique.¹⁵ The data-driven selection of these variables was validated by reviewing published literature to confirm the rationale for their inclusion in the model.^16–19

ML models

XGBoost²⁰: XGBoost is a gradient-boosting decision tree system widely used in classification and regression contexts. The core logic of XGBoost involves integrating results from multiple trees to improve overall model performance. The modeling process begins with data preprocessing, missing value management, and categorical variable encoding. Model initialization involves setting hyperparameters, such as the number of trees and learning rate. Subsequently, training begins; each tree is optimized to minimize the loss function. Regularization parameters are refined to balance model complexity and generalizability. Finally, predictions are generated by aggregating the results of individual trees. Decision trees offer a transparent, hierarchical structure that is easy to interpret. Each node in the tree represents a decision based on a specific feature, and the path from the root to a leaf node outlines the decision-making process.

SVM²¹: A popular ML method, SVM is uniquely applicable to feature spaces with increasing dimensionality (i.e., increasing complexity). SVM is used in compound classification, property prediction, and virtual compound screening in the pharmaceutical industry. The modeling process begins with data preprocessing, which involves feature normalization and label encoding. Model initialization includes selection of the appropriate kernel function, such as a linear or radial basis function. During training, SVM seeks to maximize the margin between classes while minimizing misclassification. Regularization parameters are adjusted to balance margin width and error. For prediction, SVM classifies new data points based on their position relative to the identified hyperplane. SVM maps data into a high-dimensional space, yielding the optimal hyperplane that separates classes. Although SVM models themselves are less interpretable than linear models, the support vectors (i.e., data points influencing the decision boundary) can provide insights into the decision process.

LR²²: LR is an ML method frequently used for its simplicity and adequate classification performance. It is a practical tool for predicting the development, outcome, and prognosis of common diseases and malignancies (e.g., asthma, coronary heart disease, and colon cancer). The advantages of LR among ML models used to assess binary outcomes remain controversial. Modeling begins with data preprocessing, which encompasses feature standardization and categorical variable encoding. Model initialization involves establishes weights and biases. During training, the logistic function models class probabilities; optimization focuses on maximizing log-likelihood. To prevent overfitting, L1 regularization is applied. During prediction, class probabilities are calculated, and binary decisions are made based on a selected threshold. These models are inherently interpretable because they provide coefficients for each input feature, indicating the strength and direction of a feature’s impact on the output. Clinicians can readily understand how changes in input variables lead to changes in predicted outcomes.

RF²³: RF is a classification algorithm based on ensemble learning. Advantages of RF models include ease of implementation, adaptability to various types of data, and the capacity to assess large amounts of data. Model initialization involves specifying the number of trees and other hyperparameters. Weak classifiers are interpreted according to proper statistical methods and decision trees after random selection with attribution. Training involves creating a forest of decision trees via bootstrap aggregation and predictor randomization to achieve better predictive accuracy. Predictions are aggregated from individual trees, either by voting (for classification) or by averaging (for regression). Additionally, feature importance assessment helps to identify variables that strongly contribute to model performance. Although an RF is an ensemble of decision trees that may be difficult to interpret at the individual tree level, feature importance can be derived from the model, helping clinicians to identify features that most strongly contribute to its predictions.

Statistical analysis

Statistical analyses were conducted with R software, ver. 4.2.0 (www.r-project.org). Baseline differences between the training and validation sets were analyzed using Fisher's exact test for categorical variables and the Mann–Whitney U test for continuous variables. To identify risk factors for T2DM, variables were analyzed by univariate and multivariate Cox proportional hazards regression. To evaluate ML model performances, receiver operating characteristic (ROC) curves were established; areas under the ROC (AUCs) were calculated to compare these models.²⁴ ML model parameters (accuracy, precision, recall, F1 score, and mean squared error )were calculated in the training and validation sets. Finally, calibration plots were generated to demonstrate correlations between predictions and observed results.²⁵

Results

Baseline characteristics of the study population

In total, 211,833 individuals were present in the original dataset. Among these individuals, 165,586 were excluded because of incomplete data. The training set included 32,372 individuals, whereas the validation set included 13,875 individuals. The baseline characteristics of individuals in the training and validation sets are presented in Table 1. The median ages were 43.43 and 43.53 years in the training and validation sets, respectively; both groups had a higher percentage of men (55% and 56%) than women (45% and 44%). The median BMI (23.32 kg/m²) was on the threshold for overweight in both sets, according to Chinese standards.²⁶ SBP and DBP were within normal ranges; however, the median SBP (119.04 mmHg and 119.28 mmHg) in both groups approached the upper limit of normal. Other metabolic panel values were consistent with international standards. Approximately 2.2% of individuals had a family history of T2DM. During the 5-year follow-up period, T2DM was diagnosed in 411 (1.3%) and 205 (1.5%) individuals in the training and validation sets, respectively.

Table 1.

Comparison of baseline characteristics between training and validation sets.

Baseline characteristics	Training set (n = 32,372)	Validation set (n = 13,875)	P value
Age, years (mean ± SD)	43.43 (12.66)	43.53 (12.82)	0.45
Sex, n (%)
Male	17,892 (55)	7,709 (56)	0.57
Female	14,480 (45)	6166 (44)
BMI, kg/m² (mean ± SD)	23.32 (3.26)	23.32 (3.29)	0.83
SBP, mmHg (mean ± SD)	119.04 (16.42)	119.28 (16.54)	0.14
DBP, mmHg (mean ± SD)	74.21 (10.91)	74.18 (10.86)	0.85
FPG, mmol/L (mean ± SD)	4.94 (0.53)	4.94 (0.53)	0.80
TC, mmol/L (mean ± SD)	4.77 (0.88)	4.76 (0.87)	0.07
TG, mmol/L (mean ± SD)	1.10 [0.75, 1.62]	1.10 [0.75, 1.63]	0.05
HDL cholesterol, mmol/L (mean ± SD)	1.38 (0.29)	1.37 (0.29)	0.60
LDL cholesterol, mmol/L (mean ± SD)	2.77 (0.67)	2.76 (0.66)	0.06
ALT, U/L (median [IQR])	18.00 [13.00, 27.00]	18.00 [13.00, 27.00]	0.71
AST, U/L (median [IQR])	22.00 [19.00, 27.00]	22.00 [19.00, 26.00]	0.49
BUN, mmol/L (median [IQR])	4.54 [3.83, 5.37]	4.59 [3.87, 5.39]	0.01
Creatinine, mmol/L (mean ± SD)	71.84 (16.30)	71.88 (15.94)	0.83
Family history of T2DM (%)	696 (2.2)	301 (2.2)	0.90
Number of T2DM cases	411 (1.3)	205 (1.5)	0.07

ALT, alanine transaminase; AST, aspartate aminotransferase; BMI, body mass index; BUN, blood urea nitrogen; DBP, diastolic blood pressure; FPG, fasting plasma glucose; HDL, high-density lipoprotein; IQR, interquartile range; LDL, low-density lipoprotein; SBP, systolic blood pressure; SD, standard deviation; TC, total cholesterol; T2DM, type 2 diabetes mellitus; TG, triglycerides.

Establishment of the prediction model

Univariate and multivariate Cox analysis identified nine independent risk factors for T2DM: FPG, age, TG, ALT, BMI, CR, DBP, sex, and family history of T2DM (all P < 001; Table 2). Other variables were excluded from the prediction model.

Table 2.

T2DM risk prediction via Cox proportional hazards regression modeling.

Variables	Univariable		Multivariable
Variables	HR (95% CI)	P value	HR (95% CI)	P value
Age	1.03 (1.02, 1.04)	<0.01	1.06 (1.05, 1.06)	<0.01
Sex	0.90 (0.80, 1.02)	<0.01	0.96 (0.77, 1.19)	0.71
BMI	1.12 (1.09, 1.14)	<0.01	1.13 (1.11, 1.16)	<0.01
AST	1.01 (1.01, 1.02)	<0.01
HDL cholesterol	5.35 (4.89, 5.85)	0.112
LDL cholesterol	1.20 (1.15, 1.25)	<0.01
BUN	1.006 (1.004, 1.008)	<0.01
CR	1.04 (0.99, 1.09)	<0.01	0.99 (0.99, 1.00)	0.04
Triglycerides	1.001 (0.996, 1.003)	<0.01	1.16 (1.10, 1.22)	<0.01
Cholesterol		<0.01
ALT	1.15 (0.81, 1.62)	<0.01	1.01 (0.99, 1.01)	<0.01
DBP	0.95 (0.82, 1.09)	<0.01	1.01 (1.01, 1.02)	<0.01
SBP	1.04 (1.04, 1.04)	<0.01
FPG	1.08 (0.70, 1.66)	<0.01	5.38 (4.51, 6.42)	<0.01
Family	1.20 (0.82, 1.78)	0.044	1.80 (1.17, 2.77)	<0.01

ALT, alanine transaminase; AST, aspartate aminotransferase; BMI, body mass index; BUN, blood urea nitrogen; CI, confidence interval; CR, creatinine; DBP, diastolic blood pressure; FPG, fasting plasma glucose; HDL, high-density lipoprotein; HR, hazard ratio; LDL, low-density lipoprotein; SBP, systolic blood pressure.

The feature importances of all included variables were analyzed using the XGBoost model. The importances of the variables were ranked in the following order: FPG, age, TG, ALT, BMI, CR, DBP, sex, and family history of T2DM (Figure 1). The x-axis in Figure 1 represents feature importances with positive correlations. In the next step, we subjected all ranked features to model performance evaluation.

Figure 1.

Feature importance ranking in the XGBoost model. ALT, alanine transaminase; BMI, body mass index; CR, creatinine; DBP, diastolic blood pressure; FPG, fasting plasma glucose; XGBoost, Extreme Gradient Boosting.

Discrimination and calibration performances of the four models

The nine variables were used to construct XGBoost, SVM, LR, and RF models. Performance evaluation of the four models revealed training set AUCs of 0.986, 0.896, 0.914, and 0.998, respectively; validation set AUCs were 0.812, 0.668, 0.913, and 0.838, respectively. These results are presented as ROC curves in Figure 2 and Figure 3. In summary, all four models showed high AUCs in the training set, but only the LR and RF models maintained favorable AUCs in the validation set.

Figure 2.

ROC curves for the four models in the training set. AUC, area under the ROC curve; LR, logistic regression; SVM, support vector machine; RF, random forest; ROC, receiver operating characteristic; XGBoost, Extreme Gradient Boosting.

Figure 3.

Calibration plots for the four ML models in the training set. ML, machine learning; LR, logistic regression; SVM, support vector machine; RF, random forest.

The Hosmer–Lemeshow test²⁷ was used for cross-validation and assessment of calibration performance among the four models. Linear agreement in training and validation sets is depicted in Figure 4 and Figure 5. The four models showed good agreement between predicted and actual results in the training set (Figure 4). In the validation set, the LR and RF models demonstrated relatively good fit, although the LR model appeared to overestimate T2DM risk in some patients (Figure 5).

Figure 4.

ROC curves for the four models in the validation set. AUC, area under the ROC curve; LR, logistic regression; SVM, support vector machine; RF, random forest; ROC, receiver operating characteristic; XGBoost, Extreme Gradient Boosting.

Figure 5.

Calibration plots for the four ML models in the validation set. LR, logistic regression; ML, machine learning; SVM, support vector machine; RF, random forest.

Discussion

T2DM is a common chronic metabolic disease affecting numerous individuals in China. Uncontrolled T2DM can cause irreversible complications that lead to disability and mortality. In this study, we used nine variables to construct four ML models, then evaluated their performances with real-world data. Early detection of T2DM can be achieved with accurate and uncomplicated prediction models. Therefore, we sought to identify the most favorable model that could provide guidance for T2DM prevention and management.

FPG was identified as the highest-ranking independent risk factor for T2DM. This is unsurprising because hyperglycemia is a core etiological aspect of T2DM and plays a key role in T2DM diagnosis according to all international guidelines.^4,28 FPG is the primary screening factor for T2DM because of its simplicity, speed, and cost-efficiency; these characteristics are favored by primary care physicians and specialists. HbA1c, which reflects long-term plasma glucose levels, is essential for the screening, diagnosis, and monitoring of T2DM; however, it was not included in our original data due to a lack of awareness prior to changes in Chinese guidelines.⁴ A meta-analysis showed that an HbA1c level of 7.0% has 97.3% (95% confidence interval: 95.3–98.4) diagnostic sensitivity for T2DM.²⁹ HbA1c screening should be integrated into future models as more data become available.

Age was the second highest-ranking risk factor for T2DM, according to the XGBoost model. Increasing diabetes prevalence among older adults has been identified in diverse populations across various studies. A survey of adults aged >65 years in the United States showed that the prevalence of diabetes (mainly T2DM) nearly doubled from 1994 to 2003.³⁰ T2DM prevalence has been predicted to increase from 16.0% in 2005 to 32.7% by the year 2050.³¹ Similarly, in China, Li et al.³² reported a high burden of T2DM and revealed that the substantial increase in mortality from 1990 to 2019 was likely related to population aging. Therefore, it is crucial to consider age as a risk factor in T2DM prediction models.

TG, ALT, and BMI were the third- to fifth-highest ranking risk factors in our feature importance analysis. The accumulation of excess body fat is closely related to insulin resistance, a primary mechanism of glucose dysregulation. TG levels are strongly correlated with diet and reflect the amount of body fat. A cohort study focusing on the combined TG-BMI metric revealed that it had a causal association with diabetes, particularly in young, middle-aged, and non-obese Chinese individuals.³³ In 2013, Kunutsor et al.³⁴ performed a meta-analysis of nine studies concerning the relationship between ALT and T2DM risk. Their results suggested a moderate association, although publication bias may have influenced the findings.

In this study, we used an LR model incorporating nine independent risk factors that had been identified through univariate and multivariate Cox analyses: FPG, age, TG, ALT, BMI, CR, DBP, sex, and family history of T2DM. Our LR model demonstrated robust discrimination and calibration performance in the training and validation sets. The key strengths of this model include its interpretability—clear coefficients for each predictor variable facilitate comprehension by clinicians—and strong calibration, ensuring that predicted probabilities are closely aligned with observed outcomes. Furthermore, the LR model's simplicity and transparency make it suitable for clinical practice, supporting risk assessment and guiding intervention decisions.³⁵ However, LR models assume a linear relationship between predictor variables and outcomes, which may be inaccurate in the context of complex non-linear relationships. Additionally, such models might not fully capture complicated interactions between variables, leading to decreased predictive accuracy relative to more flexible models such as RF.

This study also used an RF model, an ensemble learning technique, with the same nine risk factors. It displayed favorable discrimination in the training and validation sets. The RF model's advantages include its ability to manage non-linear relationships and complex interactions that are often present in medical data. The feature importance analysis component of RF models offers valuable insights into the relative contribution of individual predictor variables, helping to prioritize interventions and future research. Additionally, the ensemble nature of RF models can mitigate the risk of overfitting, enhancing their generalizability with respect to diverse medical datasets. Nevertheless, RF models lack the direct interpretability of LR models. Additional efforts may be needed to elucidate the clinical significance of feature importance rankings. Additionally, unoptimized RF models can overfit training data; thus, they require careful regularization and hyperparameter adjustments.

In summary, both LR and RF models show promise in terms of predicting T2DM risk, but each model has distinct advantages and limitations. Model selection should consider trade-offs between interpretability and predictive accuracy, as well as specific clinical requirements. Further research and validation efforts are needed to refine these models for practical clinical use. Notably, previous studies have used ML to predict T2DM risk,¹⁷ but they included fewer participants compared with our study.

This study had some limitations. First, the original database exclusively included individuals who attended health checkups at private facilities. This aspect may have led to selection bias because such individuals were able to undergo health examinations, regardless of the funding mechanism (self-pay or employer-sponsored). High-income populations generally have greater healthcare access and reduced likelihood of delayed interventions for prediabetes.³⁶ Second, only four well-established ML models were compared in this study. Promising ML models have recently emerged due to increased awareness of artificial intelligence applications in the medical field. Alternative ML models, such as K-nearest neighbor, merit exploration in future studies. Finally, the follow-up period for the database in this study was limited to 5 years. Owing to the chronic nature of T2DM, patients are more likely to develop clinically significant manifestations at older ages. A longer follow-up period (i.e., 10 or 20 years) may yield more comprehensive and generalizable results.

Conclusion

We used nine variables to construct four ML models based on training and validation sets derived from real-world data. Our results showed that the LR and RF models were effective for predicting T2DM risk in the Chinese population. We believe that integrating these models into clinical practice and future research can aid clinicians and public health professionals by enabling accurate prediction of T2DM risk.

Footnotes

Acknowledgements

We thank all study participants for their donated blood samples and support.

Author contributions

HL performed statistical analysis. SD wrote the manuscript. LW, Jia L, and YD interpreted the data for analysis. Jing L, ZL, YW, LJ, and SY contributed to discussion and editing. HY and XF designed the study and revised the manuscript. HY and XF had full access to the data and final responsibility for the decision to submit for publication. All authors read and approved the final manuscript.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Declaration of conflicting interest

The authors declare that there is no conflict of interest.

Funding

This study was supported by the Medical Science Research Project of Hebei Province (20231924).

ORCID iD

Xiaomin Fu

References

Saeedi

Petersohn

Salpea

, et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9(th) edition. Diabetes Res Clin Pract 2019; 157: 107843.

Kolb

Martin

Environmental/lifestyle factors in the pathogenesis and prevention of type 2 diabetes. BMC Med 2017; 15: 131.

Zheng

Ley

FB.

Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nat Rev Endocrinol 2018; 14: 88–98.

Jia

Weng

Zhu

, et al. Standards of medical care for type 2 diabetes in China 2019. Diabetes Metab Res Rev 2019; 35: e3158.

Wang

, et al. Prevalence and control of diabetes in Chinese adults. JAMA 2013; 310: 948–959.

Jia

Diabetes: a challenge for China in the 21st century. Lancet Diabetes Endocrinol 2014; 2: e6–e7.

Woldaregay

Årsand

Walderhaug

, et al. Data-driven modeling and prediction of blood glucose dynamics: machine learning applications in type 1 diabetes. Artif Intell Med 2019; 98: 109–134.

Deberneh

Kim

Prediction of type 2 diabetes based on machine learning algorithm. Int J Environ Res Public Health 2021; 18: 3317.

Choi

Rha

Kim

, et al. Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks. Yonsei Med J 2019; 60: 191–199.

10.

Nomura

Noguchi

Kometani

, et al. Artificial intelligence in current diabetes management and prediction. Curr Diab Rep 2021; 21: 61.

11.

Singh

Mhasawade

Chunara

Generalizability challenges of mortality risk prediction models: a retrospective analysis on a multi-center database. PLOS Digital Health 2022; 1: e0000023.

12.

Chen

Zhang

Yuan

, et al. Association of body mass index and age with incident diabetes in Chinese adults: a population-based cohort study. BMJ Open 2018; 8: e021768.

13.

American Diabetes Association Professional Practice Committee. 2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes-2024. Diabetes Care. 2024; 47(Suppl 1): S20–S42. doi: 10.2337/dc24-S002.

14.

von Elm

Altman

Egger

, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med 2007; 147: 573–577.

15.

Banerjee

Reynolds

Andersson

, et al. Tree-based analysis. Circ Cardiovasc Qual Outcomes 2019; 12: e004879.

16.

Dagliati

Marini

Sacchi

, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol 2018; 12: 295–302.

17.

Joshi

Dhakal

CK.

Predicting type 2 diabetes using logistic regression and machine learning approaches. Int J Environ Res Public Health 2021; 18: 7346.

18.

Olusanya

Ogunsakin

Ghai

, et al. Accuracy of machine learning classification models for the prediction of type 2 diabetes mellitus: a systematic survey and meta-analysis approach. Int J Environ Res Public Health 2022; 19: 14280.

19.

Fregoso-Aparicio

Noguez

Montesinos

, et al. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13: 148.

20.

Sagi

Rokach

Approximating XGBoost with an interpretable decision tree. Information Sciences 2021; 572: 522–542.

21.

Yan

Wang

Lei

Micro learning support vector machine for pattern classification: a high-speed algorithm. Comput Intell Neurosci 2022; 2022: 4707637.

22.

Christodoulou

Collins

, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110: 12–22.

23.

Rigatti

SJ.

Random forest. J Insur Med 2017; 47: 31–39.

24.

Hanley

McNeil

BJ.

The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29–36.

25.

Sadatsafavi

Saha-Chaudhuri

Petkau

Model-based ROC curve: examining the effect of case mix and model calibration on the ROC plot. Med Decis Making 2022; 42: 487–499.

26.

Wang

Zhou

Zhao

, et al. Body-mass index and obesity in urban and rural China: findings from consecutive nationally representative surveys during 2004–18. Lancet 2021; 398: 53–63.

27.

Paul

Pennell

Lemeshow

Standardizing the power of the Hosmer-Lemeshow goodness of fit test in large data sets. Stat Med 2013; 32: 67–80.

28.

Buse

Wexler

Tsapas

, et al. 2019 Update to: Management of Hyperglycemia in Type 2 Diabetes, 2018. A consensus report by the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 2020; 43: 487–493.

29.

Kaur

Lakshmi

PVM

Rastogi

, et al. Diagnostic accuracy of tests for type 2 diabetes and prediabetes: a systematic review and meta-analysis. PLoS One 2020; 15: e0242415.

30.

Sloan

Bethel

Ruiz

Jr. , et al. The growing burden of diabetes mellitus in the US elderly population. Arch Intern Med 2008; 168: 192–199; discussion 9.

31.

Narayan

Boyle

Geiss

, et al. Impact of recent increase in incidence on future diabetes burden: U.S., 2005-2050. Diabetes Care. 2006; 29: 2114–2116.

32.

Guo

Cao

Secular incidence trends and effect of population aging on mortality due to type 1 and type 2 diabetes mellitus in China from 1990 to 2019: findings from the Global Burden of Disease Study 2019. BMJ Open Diabetes Res Care 2021; 9: e002529.

33.

Wang

Liu

Cheng

, et al. Triglyceride glucose-body mass index and the risk of diabetes: a general population-based cohort study. Lipids Health Dis 2021; 20: 99.

34.

Kunutsor

Apekey

Walley

Liver aminotransferases and risk of incident type 2 diabetes: a systematic review and meta-analysis. Am J Epidemiol 2013; 178: 159–171.

35.

Wang

Chen

, et al. Prediction of type 2 diabetes risk and its effect evaluation based on the XGBoost model. Healthcare (Basel) 2020; 8: 247.

36.

Ishihara

Babazono

Liu

, et al. Impact of income and industry on new-onset diabetes among employees: a retrospective cohort study. Int J Environ Res Public Health 2022; 19: 1090.