Sage Journals: Discover world-class research

Abstract

Background:

Progression from prediabetes to type 2 diabetes (T2D) can be delayed with early detection and intervention. Current detection methods, relying on costly blood glucose tests, limit widespread screening. Machine learning models offer the potential for non–laboratory-based tools. However, existing prediabetes detection models lack validation in their intended target populations. Thus, this study aimed to develop and validate a non–laboratory-based machine learning tool for prediabetes detection in a specific target population.

Methods:

Based on 501 adults from a prediabetes screening project, a decision tree model was developed. Twelve potential non–laboratory-based features were extracted. The target variable was categorized into prediabetes (hemoglobin A1c [HbA_1c] ≥39 mmol/mol and <48 mmol/mol) and normoglycemia (HbA_1c <39 mmol/mol). The data set was divided into 70% for training and 30% for validation, and forward feature selection was used to identify the most relevant features.

Results:

Out of 501 participants, 88 were identified with prediabetes. The mean age and body mass index (BMI) were approximately 50 years and 27 in both the training and validation sets. Forward selection identified age and waist circumference as the most important features to include in the model. The model achieved an area under the receiver operating characteristic curve (ROC AUC) of 0.8297 and 0.7961 on the training and validation sets.

Conclusion:

A machine learning screening tool using age and waist circumference was developed with promising results. Its simplicity, by only requiring two non-laboratory features, allows for easy implementation. However, to verify the model’s generalizability and external validity, it needs to be evaluated using additional data.

Keywords

decision support machine learning non-laboratory prediabetes screening tool type 2 diabetes prevention

Introduction

Progression from prediabetes to type 2 diabetes (T2D) and subsequent complications can be delayed or prevented through the detection of people with prediabetes followed by a preventative intervention.^1-4 Both the US Preventive Task Force and the American Diabetes Association (ADA) highlight the importance of early screening and detection of prediabetes to alter the development into T2D and complications.^5,6 Despite this, prediabetes is a highly under-detected condition.⁷ Only around 11% to 19% of people meeting the criteria for prediabetes are aware of their condition,^8,9 highlighting a significant gap in prediabetes detection.

The detection of people with prediabetes currently requires a blood glucose test, using either fasting plasma glucose, an oral glucose tolerance test, or hemoglobin A1c (HbA_1c).⁵ However, as people with prediabetes often remain asymptomatic,¹⁰ these invasive and costly procedures present barriers to widespread screening.^11-13 Thus, an increasing interest in developing alternative screening tools that can detect a proportion of the population for whom blood glucose tests are necessary has emerged.

The use of machine learning-based models for prediabetes screening has gained attention due to their ability to recognize patterns across factors that are useful in disease detection.¹⁴ Specifically, there has been interest in developing models based on non-laboratory and easily assessable features, as these reduce the need and resources for laboratory testing.^15,16 This approach makes it possible to screen a larger group of asymptomatic people and to enable screening in areas without access to laboratory equipment.^12,17,18 The majority of the developed non–laboratory-based machine learning models aimed to detect both undiagnosed T2D and prediabetes without specifically assessing the models’ ability to detect prediabetes alone,¹⁴,^18-39 leaving the performance of prediabetes detection unknown. Among studies aimed at detecting both undiagnosed T2D and prediabetes, and evaluating the models’ ability to detect prediabetes alone, the models’ performance in prediabetes detection was found to be inferior compared to undiagnosed T2D.^40,41 These findings underline the necessity for a model specifically tailored to prediabetes detection, as the features in the combined detection models appear to prioritize the detection of undiagnosed T2D rather than prediabetes. In addition, a recent systematic review and meta-analysis by Liu et al evaluated the performance of different non–laboratory-based models developed to detect undiagnosed T2D in their ability to detect prediabetes. This review found that the models performed insufficiently when applied to the prediabetes population, as indicated by an area under the receiver operating characteristic curve (ROC AUC) ranging from 0.64 to 0.74. Consequently, it was concluded that these models are unsuitable for prediabetes detection.⁴²

To the best of the authors’ knowledge, only a limited number of studies have investigated non–laboratory-based machine learning models specifically developed for prediabetes detection.¹³,^43-47 These studies based their model development on registry data from wider community screenings, eg, the National Health and Nutrition Examination Survey (NHANES), the Korean National Health and Nutrition Examination Survey (KNHANES), or the Indonesian National Basic Health Survey (INBHS).¹³,^43-47 However, these models were not validated in the populations where they were intended to be implemented, which may compromise their performance when implemented in practice. Machine learning models for prediabetes detection tend to perform better when validated in the same population for which they were developed.^15,42,48 This highlights the importance of developing and validating a screening tool using a data set sampled from the specific target population where it will be applied in clinical practice.^15,42 Therefore, this study aimed to develop and validate a machine learning-based screening tool that relied on non-laboratory and easily accessible data collected in a target population for the detection of people with prediabetes.

Methods

Data Source and Feature Extraction

The data used to develop the machine learning model originated from a project between Steno Diabetes Center North Denmark and three municipalities in the northern region of Denmark. The project’s aim was to perform health screenings for adults (≥ 18 years) within community settings to identify undiagnosed prediabetes and T2D and direct participants to existing municipal interventions. Screenings were conducted in companies across the municipalities by recruiting companies that mainly hired individuals with lower education levels, as this is recognized as a risk factor for prediabetes and T2D.^49,50 Moreover, one municipality held a health check day at its health center, mainly targeting elderly and unemployed residents. The screening took place between August 2022 and April 2024. All eligible participants provided written informed consent. According to Danish legislation, ethical approval was not necessary for the study.

At the screening, demographic data were collected from the participants (Supplemental Material 1). The demographic data included gender, age, height, weight, ethnicity (both mother and father), body fat percentage, waist circumference (measured midway between the lower rib and the upper part of the hip socket with a measuring tape), blood pressure (both systolic and diastolic), educational level, known hypertension, taking blood pressure medication, smoking status, and family history of diabetes (limited to parents or siblings). Education level, known hypertension, blood pressure medication, smoking status, and family history of diabetes were self-reported by the participants. All the demographic data were considered as potential features, except for ethnicity, which was excluded for being a near-constant feature with low predictive value because most of the screened participants had a mother and father of Danish ethnicity (93.01% and 92.22%, respectively). The detection target of the model was the presence of prediabetes, defined as an HbA_1c between 39 and 47 mmol/mol, following ADA’s definition of prediabetes.⁵ The data set comprised information from 501 participants, all of whom had a measured HbA_1c level from the screening visit and did not have previously diagnosed or screening-detected T2D. A point-of-care testing device (Affinion 2, Abbott) was used to measure HbA_1c.

Preprocessing of Data

All handling and analysis of data were performed in Python version 3.11.5. Initially, the prediction target (HbA_1c), originally represented as continuous data, was encoded into a binary feature. Participants with an HbA_1c ≥ 39 mmol/mol but < 48 mmol/mol were classified as having prediabetes, while those with an HbA_1c < 39 mmol/mol were classified as normoglycemic. All with a previously diagnosed or screening-detected T2D were excluded before data preprocessing. All nominal categorical features were binary encoded, while ordinal coding was done on ordinal categorical features (Table 1). The body mass index (BMI) was calculated based on each participant’s height and weight. Subsequently, height and weight were removed from the data set, leaving 12 potential extracted features for model development (Table 1). Missing data were imputed using single imputations by chained equation, as none of the features had a considerable amount of missing data (<4%). Finally, the data set was divided into training (70%) and validation (30%) sets, stratified by the target variable to maintain the original proportion of classes.

Table 1.

Overview of Extracted Features, Their Data Type, and Levels of the Coded Features for Model Development.

Feature name	Data type	Levels of coded feature
Age	Continous	-
Gender	Norminal	0 = Woman
Gender	Norminal	1 = Man
BMI	Continous	-
Body fat (%)	Continous	-
Waist circumference (cm)	Continous	-
Systolic blood pressure (mm Hg)	Continous	-
Diastolic blood pressure (mm Hg)	Continous	-
Known hypertension	Norminal	0 = No
Known hypertension	Norminal	1 = Yes
Blood pressure medication	Norminal	0 = No
Blood pressure medication	Norminal	1 = Yes
Smoking	Norminal	0 = Never
Smoking	Norminal	1 = Prior or current
Family history of diabetes	Norminal	0 = No
Family history of diabetes	Norminal	1 = Yes
Educational level	Ordinal	1 = Primary school or lower
		2 = Upper secondary education
		3 = Vocational education
		4 = Short-cycle higher education
		5 = Bachelor’s/professional bachelor’s degree
		6 = Master’s degree or higher

Education levels were ordinal coded, while the remaining features were binary coded.

Abbreviations: BMI, body mass index.

Feature Selection

Before the model training, a feature selection process was conducted to identify the most informative features. This involved employing a sequential forward feature selection technique within a five-fold cross-validation framework, using a decision tree and ROC AUC as performance criteria. To avoid overfitting during the feature selection, initial hyperparameters were set. The minimum sample leaf was set to 30, and the maximum depth was set to three. Furthermore, the class weight was adjusted to account for an expected imbalanced data set, due to the prevalence of prediabetes. A feature was included in the model if its addition resulted in an increase of ≥1% in the ROC AUC. Following the feature selection, a grid search was performed on the hyperparameters (the minimum sample leaf and maximum depth) using a five-fold cross-validation to identify the most optimal combination. The range of the minimum sample leaf was 2 to 10, while the range of max depth was 10 to 60.

Model Training and Validation

The selected features from the feature selection process, along with the hyperparameters identified through the grid search, were used to train a decision tree model. The class weight was once again adjusted to account for an expected imbalanced data set. A decision tree was chosen due to its explainability of how the features contribute to the detection. An explainability model was prioritized, as this is important for health care professionals and patients when using machine learning models in decision-making in clinical practice.^51-53 The model was validated on the validation set using ROC AUC. In addition, the model’s performance was assessed using confusion matrices, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and a calibration plot. The optimal threshold was determined based on Youden’s Index to identify the threshold that maximized the sum of sensitivity and specificity. Furthermore, an analysis of false positive and false negative predictions was conducted to assess whether the misclassified participants’ HbA_1c levels were close or far from the prediabetes threshold, providing useful insight into the potential clinical consequences of the prediction errors.

Comparative Analysis

Given the potential for improved performance offered by more complex, non-explainable models, a random forest model was included as a comparator to the decision tree model. The comparative analysis aimed to investigate the trade-off between model explainability and model performance. The data preprocessing procedures were identical for both models, and the same training and validation sets were used. The model was trained using all available features, due to the nature of the algorithm, and with hyperparameters aligned to those specified for the decision tree to ensure comparability, aside from the number of estimators, which was set to 100 to reflect the ensemble nature of the model. The model performance was evaluated using ROC AUC.

Results

Among the 501 participants, 88 were identified with prediabetes. The training set included a total of 350 participants (289 with normoglycemia and 61 with prediabetes). The validation set included a total of 151 participants (124 with normoglycemia and 27 with prediabetes). Table 2 shows the baseline characteristics of the participants in the training and validation set. In both data sets, the mean age of the participants was approximately 50 years, the mean BMI was approximately 27, and the mean waist circumference was 96 cm. Visual inspection of the baseline data showed an overall fair division of data into training and validation sets. However, there were some numerical differences regarding males, former/current smokers, family history of diabetes, and educational level.

Table 2.

Baseline Characteristics of the Training and Validation Set.

Covariate	Training set(N = 350)	Validation set(N = 151)
Age (years)	49.81 ± 14.17	49.68 ± 15.36
Gender
Male	204 (58.29%)	74 (49.01%)
Female	146 (41.71%)	77 (50.99%)
BMI	27.02 ± 4.91	27.47 ± 6.04
Body fat (%)	28.36 ± 9.03	30.05 ± 10.37
Waist circumference (cm)	96.13 ± 13.75	96.22 ± 14.19
Systolic blood pressure (mm Hg)	135.27 ± 19.08	135.83 ± 17.81
Diastolic blood pressure (mm Hg)	84.43 ± 11.33	83.94 ± 10.53
Known hypertension
No	285 (81.43%)	120 (79.47%)
Yes	65 (18.57%)	31 (20.53%)
Blood pressure medication
No	283 (80.86%)	120 (79.47%)
Yes	67 (19.14%)	31 (20.53%)
Smoking
Never	242 (69.14%)	100 (66.23%)
Former/current	108 (30.86%)	51 (20.53%)
Family history of diabetes
No	265 (75.71%)	126 (83.44%)
Yes	85 (24.29%)	25 (16.56%)
Educational level
Primary school or lower	48 (13.71%)	15 (9.93%)
Upper secondary education	32 (9.14%)	6 (3.97%)
Vocational education	152 (43.43%)	57 (37.75%)
Short-cycle higher education	45 (12.86%)	29 (19.21%)
Bachelor’s/professional bachelor’s degree	48 (13.71%)	33 (21.85%)
Master’s degree or higher	25 (7.14%)	11 (7.28%)

Continuous data are presented as mean ± standard deviation. Categorical data are presented as numbers (percentages).

Abbreviations: BMI, body mass index; N, number.

Two out of the 12 potential features were selected based on the forward feature selection (Figure 1). The feature correlation matrix is presented in Supplemental Material 2. The two selected features were age and waist circumference. The grid search revealed that the best hyperparameter combination was a minimum sample leaf of 30 and a maximum depth of four. However, the best hyperparameter combination only showed a minor improvement in terms of ROC AUC (0.7728 vs 0.7818). The resulting decision tree is displayed in Figure 2. Starting at the top of the tree, answering the questions with “yes” or “no” at each node (gray boxes) leads to an end node classified as either normoglycemia or prediabetes (green and blue boxes). The probability of having prediabetes is shown in the end nodes. Based on the decision tree, the risk of having prediabetes is determined primarily based on age, with a split threshold of 49.5 years; three out of four end nodes on the right side (age >49.5 years) classify a person as having prediabetes, while all three end nodes on the left side (age ≤49.5 years) classify a person as normoglycemic.

Figure 1.

Forward feature selection process, illustrating the increase in ROC AUC for each added feature.

Figure 2.

Decision tree for detecting prediabetes based on age and waist circumference. Starting at the top, the gray boxes contain the questions that lead a person to an end node, indicated by the green and blue boxes. Green boxes classify a person with normoglycemia, while blue boxes classify a person with prediabetes.

The decision tree model achieved an ROC AUC of 0.8297 on the training set and 0.7961 on the validation set (Figure 3). In comparison, the random forest model achieved an ROC AUC of 0.8785 on the training set and 0.7999 on the validation set, indicating only a slight improvement in ROC AUC. The calibration plot for the validation set indicated that the decision tree model tended to overestimate predicted probabilities (Supplemental Material 3).

Figure 3.

Receiver operating characteristic curves for the decision tree model, based on the training and validation sets.

Using the optimal threshold of 0.5527 on the validation set (Supplemental Material 4), the decision tree model demonstrated a sensitivity of 0.8519, a specificity of 0.6855, a positive predictive value of 0.3710, a negative predictive value of 0.9551, and an accuracy of 0.7152 (Supplemental Material 5). Table 3 shows the derived confusion matrix values at various thresholds. Figures 4 and 5 illustrate the distribution of HbA_1c levels for false positive and false negative predictions at the optimal threshold. The distribution at the remaining thresholds can be found in Supplemental Material 6. In total, 43 (28.48%) prediction errors occurred, comprising four false negative predictions (9.30%) and 39 false positive predictions (90.7%). Three of the four participants with false negative predictions had an HbA_1c level of 39 mmol/mol, while the last had an HbA_1c level of 40 mmol/mol. Among the participants with a false positive prediction, approximately 40% had an HbA_1c level near the prediabetes definition threshold at either 37 or 38 mmol/mol, while the remaining 60% had an HbA_1c between 30 and 36 mmol/mol.

Table 3.

Confusion Matrix Values at Various Thresholds for the Decision Tree Model.

Threshold	True negatives	False positives	False negatives	True positives
0.8527	113	11	18	9
0.7321	106	18	11	16
0.5527	85	39	4	23
0.4660	70	54	4	23
0.3270	55	69	2	25
0.1363	39	85	2	25

Figure 4.

Distribution of HbA_1c levels among participants with false negative predictions from the decision tree model.

Figure 5.

Distribution of HbA_1c levels among participants with false positive predictions from the decision tree model.

Discussion

This study demonstrated that it is possible to detect prediabetes in a target population with a machine learning-based screening tool that relies on non-laboratory and easily accessible features with acceptable performance.

To the best of the author’s knowledge, no previous studies within the field of non–laboratory-based screening tools specifically for prediabetes detection have based their model development on data sampled from a target population. However, previous studies have developed and validated non–laboratory-based machine learning models specifically developed for prediabetes detection using registry data. These models have achieved an ROC AUC ranging from 0.31 to 0.80.¹³,^43-47 In comparison, the ROC AUC of 0.7961 achieved in the present study with the decision tree model showed superior or equal performance in prediabetes detection. Thus, the decision tree model demonstrated acceptable performance, highlighting the potential of an explainable machine learning-based screening tool for prediabetes detection in a target population.

The comparator analysis based on the random forest model achieved an ROC AUC of 0.7999, showing only a minor performance improvement compared to the decision tree model’s ROC AUC of 0.7961. However, this minor improvement does not outweigh the advantages of explainability offered by the decision tree model. Given the comparable performance of the decision tree and its ability to provide interpretable and transparent decision-making, it appears more suitable for clinical practice, where explainability is essential for building clinical trust and facilitating implementation.^51-53

The decision tree model exhibited a high amount of false positive predictions (90.7% of all prediction errors). This may present a limitation of the model in clinical practice, as it leads to unnecessary referral for blood glucose testing of people with normoglycemia, thus consuming laboratory resources and potentially causing anxiety among this group. However, reducing the number of false positive predictions inevitably results in more false negative predictions.⁵⁴ Since false positive predictions only result in a confirmatory blood glucose test, prioritizing detecting as many people as possible with prediabetes may be preferable.^13,43 Notably, a considerable amount (40%) of the false positive cases were close to the prediabetes threshold (≥39 mmol/mol). Although these are misclassified, they may represent a borderline risk group of prediabetes and T2D.⁵⁵ At the same time, all cases classified as false negatives were close to the prediabetes threshold, with an HbA_1c at either 39 mmol/mol or 40 mmol/mol. These prediction errors may represent less severe misclassifications, as they present a group with a lower risk of T2D compared to cases with HbA_1c levels that are closer to the threshold for T2D.^56-58 While the model has room for improvement, this suggests a reasonable balance between the false positive and false negative predictions made by the current model and highlights the importance of analyzing the prediction errors to decide on the trade-off between false positive and false negative predictions in future developed models within the research area.

The model developed in this study included age and waist circumference as features. Consistent with the current study, age was included in the models of previous studies.¹³,^43-47 Given that age is a known risk factor for prediabetes,^59-61 the inclusion of this feature in the models was understandable. Similarly, in the studies by Abbas et al,¹³ Choi et al,⁴⁴ and Wang et al,⁴⁶ waist circumference was also included in the developed models, consistent with the current study. In other similar studies, waist circumference was not available as a feature. However, these studies included other features related to body composition, such as BMI and waist-to-height ratio.^43,45,47 The BMI, waist circumference, and waist-to-height ratio can all be used as indicators of overweight and obesity,^62,63 meaning that each model included a feature related to overweight or obesity, which is a known risk factor for prediabetes.^64,65 In contrast to this study, the previous studies included between four and 11 features in the final models. This may be due to methodological differences related to model type or procedures for feature selection, where the inclusion of features, eg, relied on statistically significant bivariate analyses^45,46 or involved including all features without any selection.⁴⁷ However, the inclusion of only two features in the current model, without compromising performance, enhances the practicality of the screening tool for use in clinical practice, as it requires minimal data collection and processing. In addition, collecting age and waist circumference does not require any special equipment and can be easily performed by trained personnel, even in areas without access to health care providers. This makes the screening tool attractive for use in low-resource settings, where it could be deployed as a paper-based tool and integrated into existing outreach programs, such as community health screening initiatives or chronic disease prevention programs.

Although the model is not intended for self-administration by individuals at risk of prediabetes, it is essential to note that, if it were to be considered for such use, variability in waist circumference measurements may pose a concern. Previous studies have identified discrepancies between measurements taken by trained personnel and those obtained through self-measurement by individuals,^66-68 which could potentially affect the model’s performance by producing false positive or false negative results. Given the strong correlation between waist circumference and BMI reported in this study, BMI may be considered a potential alternative to waist circumference for self-administration as self-reported BMI has been associated with lower variability than self-measured waist circumference.⁶⁹ However, using BMI instead of waist circumference resulted in a slightly reduced validation ROC AUC of 0.7924 in the current study.

The definition of prediabetes defined solely on HbA_1c in the current study presents a limitation for the model’s ability to detect the broader prediabetes population in clinical practice, as it excludes those solely defined by impaired fasting glucose (IFG) or impaired glucose tolerance (IGT). Previous studies have shown that only around 10% of people with prediabetes fulfill all three criteria used to define prediabetes (HbA_1c, IFG, and IGT).^70,71 The overlap between HbA_1c-defined and IFG-defined prediabetes has been found to be between 10.36% and 17.53%, while the overlap between HbA_1c-defined and IGT-defined prediabetes has been found to be between 4.14% and 6.07%.^70,71 Thus, solely using HbA_1c to detect people with prediabetes leaves a considerable proportion of people with prediabetes defined by one of the other criteria undetected. In addition, some of the characteristics of prediabetes defined by the three criteria have been found to differ. For instance, the risk of having IFG was found to increase with male gender, while the risk of IGT increased with female gender, and no association of gender was found for HbA_1c.^70,71 Furthermore, older age was associated with a higher risk of having both HbA_1c-defined and IGT-defined prediabetes,^70,71 while unemployment was associated with IGT-defined prediabetes.⁷¹ This complicates the generalizability of the model for detecting IFG-defined or IGT-defined prediabetes, as the features used for detecting HbA_1c-defined prediabetes may deviate from the features that should be used for detecting IFG-defined and IGT-defined prediabetes. Another limitation of the current study is the size and origin of the data. The data were sampled in a relatively limited geographic area of Denmark. This limitation may restrict the generalizability to the wider prediabetes population and other ethnicities.

Conclusion

In conclusion, it was possible to develop a machine learning-based screening tool for prediabetes with acceptable performance for a target population. The tool is easy to implement in clinical practice. It can be used on paper and requires only the collection of age and waist circumference, both of which are easily measurable without laboratory tests. Implementing the tool in clinical practice could improve screening and potentially increase the detection of prediabetes. Nonetheless, the model requires further refinement before implementation. Investigating its ability to detect IFG- and IGT-defined prediabetes, incorporating additional data, and performing external validation in another target population will be essential to confirm the model’s generalizability and applicability for prediabetes detection.

Supplemental Material

sj-docx-1-dst-10.1177_19322968251376380 – Supplemental material for Developing a Simple Non–Laboratory-Based Machine Learning Tool for Prediabetes Screening in a Target Population: A Proof-of-Concept Study

Supplemental material, sj-docx-1-dst-10.1177_19322968251376380 for Developing a Simple Non–Laboratory-Based Machine Learning Tool for Prediabetes Screening in a Target Population: A Proof-of-Concept Study by Tanja Fredensborg Holm, Thomas Kronborg, Morten Hasselstrøm Jensen and Stine Hangaard in Journal of Diabetes Science and Technology

Footnotes

Acknowledgements

The authors would like to express their gratitude to the municipalities involved for collecting the data and to Steno Diabetes Center North Denmark for providing the data used in the study.

Abbreviations

ADA, American Diabetes Association; BMI, body mass index; CDC, Centers for Disease Control and Prevention; FINDRISC, Finnish Diabetes Risk Score; HbA_1c, hemoglobin A1c; IFG, impaired fasting glucose; IGT, impaired glucose tolerance; INBHS, Indonesian National Basic Health Survey; KNHANES, Korean National Health and Nutrition Examination Survey; NHANES, National Health and Nutrition Examination Survey; ROC AUC, area under the receiver operating characteristics curve; T2D, type 2 diabetes.

Declaration of Conflicting Interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MHJ is full-time employed and owns shares in Novo Nordisk A/S. No conflicts of interest were declared by the remaining authors.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Tanja Fredensborg Holm

Thomas Kronborg

Morten Hasselstrøm Jensen

Stine Hangaard

Supplemental Material

Supplemental material for this article is available online.

References

Dhippayom

Chaiyakunapruk

Krass

How diabetes risk assessment tools are implemented in practice: a systematic review. Diabetes Res Clin Pract. 2014;104(3):329-342. doi:10.1016/j.diabres.2014.01.008.

Zemlin

Matsha

Kengne

Erasmus

RT.

Derivation and validation of an HbA1c optimal cutoff for diagnosing prediabetes in a South African mixed ancestry population. Clinica Chimica Acta. 2015;448:215-219. doi:10.1016/j.cca.2015.06.019.

Winkler

Hidvégi

Vándorfi

Balogh

Jermendy

Prevalence of undiagnosed abnormal glucose tolerance in adult patients cared for by general practitioners in Hungary. Results of a risk-stratified screening based on FINDRISC questionnaire. Med Sci Monit. 2013;19:67-72. doi:10.12659/MSM.883747.

Paulweber

Valensi

Lindström

, et al. A European evidence-based guideline for the prevention of type 2 diabetes. Horm Metab Res. 2010;42 suppl 1:S3-S36. doi:10.1055/s-0029-1240928.

American Diabetes Association Professional Practice Committee. 2. Diagnosis and classification of diabetes: standards of care in diabetes—2024. Diabetes Care. 2023;47:S20-S42. doi:10.2337/dc24-S002.

Davidson

Barry

Mangione

, et al. Screening for prediabetes and type 2 diabetes: US preventive services task force recommendation statement. JAMA. 2021;326:736-743. doi:10.1001/jama.2021.12531.

Karve

Hayward

. Prevalence, diagnosis, and treatment of impaired fasting glucose and impaired glucose tolerance in nondiabetic U.S. Diabetes Care. 2010;33(11):2355-2359. doi:10.2337/dc09-1957.

Xia

P-F

Tian

Y-X

Geng

T-T

, et al. Trends in prevalence and awareness of prediabetes among adults in the U.S., 2005–2020. Diabetes Care. 2021;45:e21-e23. doi:10.2337/dc21-2100.

Centers for Disease Control and Prevention (CDC). Awareness of prediabetes—United States, 2005-2010. Morb Mortal Wkly Rep. 2013;62:209-212.

10.

Rett

Gottwald-Hostalek

Understanding prediabetes: definition, prevalence, burden and treatment options for an emerging disease. Curr Med Res Opin. 2019;35(9):1529-1534. doi:10.1080/03007995.2019.1601455.

11.

Makrilakis

Liatis

Grammatikou

, et al. Validation of the Finnish diabetes risk score (FINDRISC) questionnaire for screening for undiagnosed type 2 diabetes, dysglycaemia and the metabolic syndrome in Greece. Diabetes Metab. 2011;37(2):144-151. doi:10.1016/j.diabet.2010.09.006.

12.

Henjum

Hjellset

Andersen

Flaaten

MØ

Morseth

MS.

Developing a risk score for undiagnosed prediabetes or type 2 diabetes among Saharawi refugees in Algeria. BMC Public Health. 2022;22:1-9. doi:10.1186/s12889-022-13007-0.

13.

Abbas

Mall

Errafii

, et al. Simple risk score to screen for prediabetes: a cross-sectional study from the Qatar Biobank cohort. J Diabetes Investig. 2021;12(6):988-997. doi:10.1111/jdi.13445.

14.

Dinh

Miertschin

Young

Mohanty

SD.

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak. 2019;19:211. doi:10.1186/s12911-019-0918-5.

15.

Cheng

Dong

, et al. Non-laboratory-based risk prediction tools for undiagnosed pre-diabetes: a systematic review. Diagnostics (Basel). 2023;13:1294. doi:10.3390/diagnostics13071294.

16.

Cheng

Dong

Tse

, et al. External validation of the Hong Kong Chinese non-laboratory risk models and scoring algorithm for case finding of prediabetes and diabetes mellitus in primary care. J Diabetes Investig. 2024;15(9):1317-1325. doi:10.1111/jdi.14256.

17.

Gao

Dong

Pang

, et al. A simple Chinese risk score for undiagnosed diabetes. Diabet Med. 2010;27(3):274-281. doi:10.1111/j.1464-5491.2010.02943.x.

18.

Memish

Chang

Saeedi

Al Hamid

Abid

Ali

MK.

Screening for type 2 diabetes and dysglycemia in Saudi Arabia: development and validation of risk scores. Diabetes Technol Ther. 2015;17(10):693-700. doi:10.1089/dia.2014.0267.

19.

Handlos

Witte

Almdal

, et al. Risk scores for diabetes and impaired glycaemia in the Middle East and North Africa. Diabet Med. 2013;30(4):443-451. doi:10.1111/dme.12118.

20.

Buccheri

Dell’Aquila

Russo

Artificial intelligence in health data analysis: the Darwinian evolution theory suggests an extremely simple and zero-cost large-scale screening tool for prediabetes and type 2 diabetes. Diabetes Res Clin Pract. 2021;174:108722. doi:10.1016/j.diabres.2021.108722.

21.

Gray

Taub

Khunti

, et al. The Leicester risk assessment score for detecting undiagnosed type 2 diabetes and impaired glucose regulation for use in a multiethnic UK setting. Diabet Med. 2010;27(8):887-895. doi:10.1111/j.1464-5491.2010.03037.x.

22.

Koopman

Mainous

3rd Everett

Carter

RE.

Tool to assess likelihood of fasting glucose ImpairmenT (TAG-IT). Ann Fam Med. 2008;6(6):555-561. doi:10.1370/afm.913.

23.

Lee

Bang

Kim

Park

Kim

DJ.

A simple screening score for diabetes for the Korean population: development, validation, and comparison with other scores. Diabetes Care. 2012;35(8):1723-1730. doi:10.2337/dc11-2347.

24.

Heikes

Eddy

Arondekar

Schlessinger

Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care. 2008;31(5):1040-1045. doi:10.2337/dc07-1150.

25.

Liu

Valdez

Gwinn

Khoury

MJ.

Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med Inform Decis Mak. 2010;10:16. doi:10.1186/1472-6947-10-16.

26.

Bang

Edwards

Bomback

, et al. Development and validation of a patient self-assessment score for diabetes risk. Ann Intern Med. 2009;151:775-783. doi:10.7326/0003-4819-151-11-200912010-00005.

27.

Hische

Luis-Dominguez

Pfeiffer

AFH

Schwarz

Selbig

Spranger

Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus. Eur J Endocrinol. 2010;163:565-571. doi:10.1530/EJE-10-0649.

28.

Shankaracharya, Odedra

Samanta

Vidyarthi

AS.

Computational intelligence-based diagnosis tool for the detection of prediabetes and type 2 diabetes in India. Rev Diabet Stud. 2012;9(1):55-62. doi:10.1900/RDS.2012.9.55.

29.

Štiglic

Kocbek

Cilar

, et al. Development of a screening tool using electronic health records for undiagnosed type 2 diabetes mellitus and impaired fasting glucose detection in the Slovenian population. Diabet Med. 2018;35(5):640-649. doi:10.1111/dme.13605.

30.

Birk

Matsuzaki

Fung

, et al. Exploration of machine learning and statistical techniques in development of a low-cost screening method featuring the global diet quality score for detecting prediabetes in rural India. J Nutr. 2021;151:110S-118S. doi:10.1093/jn/nxab281.

31.

Dong

Tse

TYE

Mak

, et al. Non-laboratory-based risk assessment model for case detection of diabetes mellitus and pre-diabetes in primary care. J Diabetes Investig. 2022;13(8):1374-1386. doi:10.1111/jdi.13790.

32.

Nelson

Boyko

; Third National Health and Nutrition Examination Survey. Predicting impaired glucose tolerance using common clinical information: data from the third national health and nutrition examination survey. Diabetes Care. 2003;26(7):2058-2062. doi:10.2337/diacare.26.7.2058.

33.

Robinson

Agarwal

Nerenberg

Validating the CANRISK prognostic model for assessing diabetes risk in Canada’s multi-ethnic population. Chronic Dis Inj Can. 2011;32(1):19-31.

34.

Xin

Yuan

Hua

, et al. A simple tool detected diabetes and prediabetes in rural Chinese. J Clin Epidemiol. 2010;63(9):1030-1035. doi:10.1016/j.jclinepi.2009.11.012.

35.

Bahijri

Al-Raddadi

Ajabnoor

, et al. Dysglycemia risk score in Saudi Arabia: a tool to identify people at high future risk of developing type 2 diabetes. J Diabetes Investig. 2020;11(4):844-855. doi:10.1111/jdi.13213.

36.

Barengo

Tamayo

Tono

Tuomilehto

A Colombian diabetes risk score for detecting undiagnosed diabetes and impaired glucose regulation. Prim Care Diabetes. 2017;11(1):86-93. doi:10.1016/j.pcd.2016.09.004.

37.

Sun

Tang

, et al. A Chinese risk score model for identifying postprandial hyperglycemia without oral glucose tolerance test. Diabetes Metab Res Rev. 2014;30(4):284-290. doi:10.1002/dmrr.2490.

38.

Gray

Barros

Raposo

Khunti

Davies

Santos

AC.

The development and validation of the Portuguese risk score for detecting type 2 diabetes and impaired fasting glucose. Prim Care Diabetes. 2013;7(1):11-18. doi:10.1016/j.pcd.2013.01.003.

39.

Sadek

Abdelhafez

Al-Hashimi

, et al. Screening for diabetes and impaired glucose metabolism in Qatar: models’ development and validation. Prim Care Diabetes. 2022;16(1):69-77. doi:10.1016/j.pcd.2021.10.002.

40.

Buccheri

Dell’Aquila

Russo

Stratified analysis of the age-related waist circumference cut-off model for the screening of dysglycemia at zero-cost. Obes Med. 2022;31:100398. doi:10.1016/j.obmed.2022.100398.

41.

Gray

Davies

Hiles

, et al. Detection of impaired glucose regulation and/or type 2 diabetes mellitus, using primary care electronic data, in a multiethnic UK community setting. Diabetologia. 2012;55(4):959-966. doi:10.1007/s00125-011-2432-x.

42.

Liu

Feng

, et al. A meta-analysis of diabetes risk prediction models applied to prediabetes screening. Diabetes Obes Metab. 2024;26(5):1593-1604. doi:10.1111/dom.15457.

43.

Sentell

Schillinger

A new public health tool for risk assessment of abnormal glucose levels. Prev Chronic Dis. 2010;7(2):A34.

44.

Choi

Kim

Yoo

, et al. Screening for prediabetes using machine learning models. Comput Math Methods Med. 2014;2014:e618976. doi:10.1155/2014/618976.

45.

Fujiati

Damanik

Bachtiar

Nurdin

Ward

Development and validation of prediabetes risk score for predicting prediabetes among Indonesian adults in primary care: cross-sectional diagnostic study. Interv Med Appl Sci. 2017;9(2):76-85. doi:10.1556/1646.9.2017.2.18.

46.

Wang

Liu

Qiu

Ding

Y-H

Chen

W-Q.

A simple risk score for identifying individuals with impaired fasting glucose in the Southern Chinese population. Int J Environ Res Public Health. 2015;12:1237-1252. doi:10.3390/ijerph120201237.

47.

Rajput

Garg

Rajput

Prediabetes risk evaluation scoring system PRESS: a simplified scoring system for detecting undiagnosed prediabetes. Prim Care Diabetes. 2019;13(1):11-15. doi:10.1016/j.pcd.2018.11.011.

48.

Barber

Davies

Khunti

Gray

LJ.

Risk assessment tools for detecting those with pre-diabetes: a systematic review. Diabetes Res Clin Pract. 2014;105(1):1-13. doi:10.1016/j.diabres.2014.03.007.

49.

Hill-Briggs

Adler

Berkowitz

, et al. Social determinants of health and diabetes: a scientific review. Diabetes Care. 2021;44:258-279. doi:10.2337/dci20-0053.

50.

Formagini

Brooks

Roberts

, et al. Prediabetes prevalence and awareness by race, ethnicity, and educational attainment among U.S. adults. Front Public Health. 2023;11:1277657. doi:10.3389/fpubh.2023.1277657.

51.

Oikonomou

Khera

Machine learning in precision diabetes care and cardiovascular risk prediction. Cardiovasc Diabetol. 2023;22:1-16. doi:10.1186/s12933-023-01985-3.

52.

Band

Yarahmadi

Hsu

C-C

, et al. Application of explainable artificial intelligence in medical health: a systematic review of interpretability methods. Inform Med Unlocked. 2023;40:101286. doi:10.1016/j.imu.2023.101286.

53.

Amann

Blasimme

Vayena

Frey

Madai

VI.

Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20:310. doi:10.1186/s12911-020-01332-6.

54.

Chu

An introduction to sensitivity specificity predictive values likelihood ratios. Emerg Med. 1999;11:175-181. Wiley Online Library. https://onlinelibrary-wiley-com.zorac.aub.aau.dk/doi/full/10.1046/j.1442-2026.1999.00041.x. Accessed October 29, 2024.

55.

Kato

Noda

Suga

, et al. Haemoglobin A1c cut-off point to identify a high risk group of future diabetes: results from the Omiya MA cohort study. Diabet Med. 2012;29(7):905-910. doi:10.1111/j.1464-5491.2012.03572.x.

56.

Rodgers

Hill

Dennis

, et al. Choice of HbA1c threshold for identifying individuals at high risk of type 2 diabetes and implications for diabetes prevention programmes: a cohort study. BMC Med. 2021;19:184. doi:10.1186/s12916-021-02054-w.

57.

Chamnan

Simmons

Forouhi

, et al. Incidence of type 2 diabetes using proposed HbA1c diagnostic criteria in the European prospective investigation of cancer–Norfolk cohort: implications for preventive strategies. Diabetes Care. 2011;34:950. doi:10.2337/dc09-2326.

58.

Kim

Bae

Choe

Park

JY.

Risk of progression to diabetes from prediabetes defined by HbA1c or fasting plasma glucose criteria in Koreans. Diabetes Res Clin Pract. 2016;118:105-111. doi:10.1016/j.diabres.2016.06.009.

59.

Echouffo-Tcheugui

Selvin

Prediabetes and what it means: the epidemiological evidence. Annu Rev Public Health. 2021;42:59-77. doi:10.1146/annurev-publhealth-090419-102644.

60.

Rhee

Woo

JT.

The prediabetic period: review of clinical aspects. Diabetes Metab J. 2011;35(2):107-116. doi:10.4093/dmj.2011.35.2.107.

61.

Yan

Cai

Han

Chen

The interaction between age and risk factors for diabetes and prediabetes: a community-based cross-sectional study. Diabetes Metab Syndr Obes. 2023;16:85-93. doi:10.2147/DMSO.S390857.

62.

Ashwell

Gibson

Waist-to-height ratio as an indicator of ‘early health risk’: simpler and more predictive than using a “matrix” based on BMI and waist circumference. BMJ Open. 2016;6:e010159. doi:10.1136/bmjopen-2015-010159.

63.

Kim

CH.

Measurements of adiposity and body composition. Korean J Obes. 2016;25:115-120. doi:10.7570/kjo.2016.25.3.115.

64.

Asmelash

Mesfin Bambo

Sahile

Asmelash

Prevalence and associated factors of prediabetes in adult East African population: a systematic review and meta-analysis. Heliyon. 2023;9(11):e21286. doi:10.1016/j.heliyon.2023.e21286.

65.

Tourkmani

Alharbi

Bin Rsheed

, et al. Characteristics and risk factors associated with developing prediabetes in Saudi Arabia. Ann Med. 2024;56(1):2413922. doi:10.1080/07853890.2024.2413922.

66.

Sarkkola

Rounge

Simola-Ström

von Kraemer

Roos

Weiderpass

Validity of home-measured height, weight and waist circumference among adolescents. Eur J Public Health. 2016;26:975-977. doi:10.1093/eurpub/ckw133.

67.

Reidpath

Cheah

JC-H

Lam

F-C

Yasin

Soyiri

Allotey

Validity of self-measured waist and hip circumferences: results from a community study in Malaysia. Nutr J. 2013;12:135. doi:10.1186/1475-2891-12-135.

68.

Contardo Ayala

Nijpels

Lakerveld

Validity of self-measured waist circumference in adults at risk of type 2 diabetes and cardiovascular disease. BMC Med. 2014;12:170. doi:10.1186/s12916-014-0170-x.

69.

Tuomela

Kaprio

Sipilä

, et al. Accuracy of self-reported anthropometric measures — findings from the Finnish twin study. Obes Res Clin Pract. 2019;13:522-528. doi:10.1016/j.orcp.2019.10.006.

70.

Zhu

Yang

, et al. Factors correlated with targeted prevention for prediabetes classified by impaired fasting glucose, impaired glucose tolerance, and elevated HbA1c: a population-based longitudinal study. Front Endocrinol (Lausanne). 2022;13:965890. doi:10.3389/fendo.2022.965890.

71.

Greiner

Emmert-Fees

KMF

Becker

, et al. Toward targeted prevention: risk factors for prediabetes defined by impaired fasting glucose, impaired glucose tolerance and increased HbA1c in the population-based KORA study from Germany. Acta Diabetol. 2020;57:1481-1491. doi:10.1007/s00592-020-01573-x.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.69 MB