Abstract
Introduction/Objectives:
A non-laboratory-based pre-diabetes/diabetes mellitus (pre-DM/DM) risk prediction model developed from the Hong Kong Chinese population showed good external discrimination in a primary care (PC) population, but the estimated risk level was significantly lower than the observed incidence, indicating poor calibration. This study explored whether recalibrating/updating methods could improve the model’s accuracy in estimating individuals’ risks in PC.
Methods:
We performed a secondary analysis on the model’s predictors and blood test results of 919 Chinese adults with no prior DM diagnosis recruited from PC clinics from April 2021 to January 2022 in HK. The dataset was randomly split in half into a training set and a test set. The model was recalibrated/updated based on a seven-step methodology, including model recalibrating, revising and extending methods. The primary outcome was the calibration of the recalibrated/updated models, indicated by calibration plots. The models’ discrimination, indicated by the area under the receiver operating characteristic curves (AUC-ROC), was also evaluated.
Results:
Recalibrating the model’s regression constant, with no change to the predictors’ coefficients, improved the model’s accuracy (calibration plot intercept: −0.01, slope: 0.69). More extensive methods could not improve any further. All recalibrated/updated models had similar AUC-ROCs to the original model.
Conclusion:
The simple recalibration method can adapt the HK Chinese pre-DM/DM model to PC populations with different pre-test probabilities. The recalibrated model can be used as a first-step screening tool and as a measure to monitor changes in pre-DM/DM risks over time or after interventions.
Introduction
Early detection and management of diabetes mellitus (DM) are important to prevent complications and premature mortality. Identifying individuals with pre-diabetes (pre-DM), which is potentially reversible, would be most effective to prevent progression to DM. Multivariable risk prediction models have been developed to facilitate the early detection of individuals with pre-DM and DM.1,2 Non-laboratory-based risk models could identify and triage those with higher risks for targeted diagnostic blood tests and preventive interventions for better allocation of resources. 3 Given the high prevalence of undiagnosed pre-DM/DM in Hong Kong, 4 2 new Hong Kong (HK) Chinese non-laboratory-based pre-DM/DM risk prediction models were developed from a population-representative Population Health Survey (PHS) 2014/15 dataset 4 using logistic regression (LR) and machine learning (ML) methods, respectively. 5 Both models showed equally good external validity in discriminating pre-DM/DM cases from non-cases in a Chinese adult population recruited from primary care (PC) clinics in Hong Kong. 6 However, both models showed poor external calibration, as indicated by the significant differences between the absolute pre-DM/DM risks predicted by the models and the observed incidence. Calibration plots of the models indicated that the models tended to systematically underestimate the absolute pre-DM/DM risks for individuals in PC.
The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement recommends that risk models should be recalibrated/updated based on the characteristics of the intended population if poor calibration is observed during validation, 7 as poorly calibrated risk models could have a lower clinical utility in practice. 8 In our case, the HK Chinese risk models appear to underestimate the level of absolute pre-DM/DM risks for individuals in the PC population, who tend to have a higher prevalence of pre-DM/DM than the general population and could therefore provide them with a false sense of reassurance regarding their pre-DM/DM risk levels. Furthermore, absolute risk estimates can be used to guide the intensity of interventions and evaluate their effectiveness.9-13 Recalibrating/updating the HK Chinese pre-DM/DM risk models is required to enable their applications as reliable outcome measures in PC.
Previous studies have shown that recalibrating existing risk models based on the characteristics of risk factors and outcome incidence of the intended population can substantially improve the model’s external predictive accuracy.14,15 Model recalibration is often the preferred method to generalize existing models to other populations. It builds onto the associations established between risk factors and the outcome in the original population while enhancing the prediction accuracy through recalibrating the models’ prediction algorithm to accommodate different predictor distributions and pre-test probability in another population. 16 In addition to model recalibration, more extensive methods, such as re-estimating the regression coefficient of individual predictors (model revision) and including additional predictors that were not available during the original model development process (model extension), have also been deployed to further enhance the predictive performance of models in external populations
As we found no significant difference in the external performance between the 2 (LR and ML) HK Chinese models in the PC population, 6 this study aimed to recalibrate/update the LR model, which is more straightforward and transparent than the ML model. We evaluated whether recalibrating, revising and/or extending the HK Chinese non-laboratory-based pre-DM/DM risk prediction LR model could improve the accuracy of estimating pre-DM/DM risks, as well as the discriminatory ability in case-finding of pre-DM/DM in the Hong Kong PC population.
Subjects
Study Design and Data Source
This was a secondary analysis of data on predictors of the model and blood test results of 919 Chinese adults with no prior DM diagnosis who were recruited from PC clinics in Hong Kong in our study on the external validity of HK Chinese pre-DM/DM risk prediction models. 6 We randomly split the dataset in half into a training set and a test set. We recalibrated/updated the HK Chinese pre-DM/DM risk prediction LR model based on the training set and evaluated the performance of the updated models using the test set. The validation study population was a convenience voluntary sample recruited from public/private PC clinics from 8th April 2021 to 19th January 2022 in Hong Kong. Inclusion criteria were adults between 18 and 84 years with no prior doctor-diagnosed DM, coronary heart disease, stroke, chronic kidney disease, cancer or anemia. Exclusion criteria were individuals who were non-Chinese, could not communicate in Chinese/English, were pregnant, or were too ill to participate. Details on the study population, participant recruitment, and study procedures of the PC validation study are available in the published study protocol. 17 The study was approved by the institutional review board of The University of Hong Kong/Hong Kong Hospital Authority Hong Kong West Cluster (UW19-831) and Hong Kong Hospital Authority Kowloon Central/Kowloon East Cluster (REC(KC/KE)-21-0042/ER-3). The study is registered at the US ClinicalTrial.gov (NCT04881383) and the HKU clinical trials registry (HKUCTR-2808).
In addition to the original model predictors, 5 the PC validation study also collected data on other risk factors of pre-DM/DM, including the presence/absence of a family history of DM and weekly vegetable consumption. The dataset also included participants’ blood test results on oral glucose tolerance test (OGTT) and hemoglobin A1c (HbA1c) levels, which were used to diagnose pre-DM/DM according to the World Health Organization and American Diabetes Association’s definitions.18,19 The incidence of pre-DM/DM in the PC validation study population was 53.43% (n = 491; Pre-DM: 49.18% (n = 452), DM: 4.24% (n = 39). Each training set and test set contained at least 100 cases of pre-DM/DM after the random splitting, which achieved an acceptable statistical power to detect any effect of model recalibration/update on predictive performance. 20
Materials and Methods
Outcome Measures
The primary outcome measure of this study was the calibration accuracy of the recalibrated/updated model in the test set, measured by the intercept and calibration slope on the calibration plot and the concordance between the predicted absolute pre-DM/DM risk estimates and the observed incidence. Ideally, the model should have perfect calibration with an intercept of 0 and slope of 1 in the calibration plot. 21 The secondary outcome was the discrimination of the recalibrated/updated models in detecting pre-DM/DM cases, as evaluated by the area under the receiver operating characteristic curves (AUC-ROC).
Statistical Analyses
The original multivariable LR model estimates the absolute pre-DM/DM risk levels from the weighted sum of the non-laboratory-based predictors, namely age, body mass index (BMI), waist-hip-ratio (WHR), smoking status, sleep duration, weekly fruit consumption and amount of vigorous activity per week. 5 Additionally, it contains two interaction terms, that is, age 2 and age*sleep duration, which can be summarized altogether as: Pre-DM/DM risk = 1/(1 + e-linear predictor), where linear predictor = α + β1*predictor1 + . . . + β9*predictor9. 5 Here, α is the regression constant, and β1 to β9 are the regression coefficients of the predictors. Poor calibration was indicated by the calibration plot of the original LR model in external validation on the PC population (intercept: 1.79 [1.64, 1.94], calibration slope: 0.74 [0.60, 0.87]).
Using the training set, we recalibrated/updated the model using the methodologies recommended by Janssen et al 22 and Steyerberg et al 23 (Table 1). In brief, we applied 7 step-wise methods in an attempt to recalibrate/update the model in which the first 2 are considered as model recalibrating methods, the third and fourth are considered model revising methods that revise the regression coefficients of the predictors, and the final 3 are model extending methods that extend the model with the additional predictors.
Step-Wise Methodology Used to Recalibrate and Update Multivariable Logistic Regression Risk Prediction Model, as Proposed by Janssen et al and Steyerbel et al.
We evaluated the performance, i.e. the calibration (calibration plot) and discrimination (AUC-ROC), of each updated model in the split test set. DeLong’s test was used to compare the AUC-ROCs of the updated models to that of the original model. 24 All statistical analyses were done using R 3.5.1 and IBM SPSS Statistics version 26.
Results
The characteristics of participants were similar between the training and test datasets (Table 2). The mean age for the training and test sets were 51.7 and 51.0 years, respectively. The proportion of females were 66.7% (n = 306) and 66.3% (n = 305) in the training and test sets, respectively. The pre-DM/DM incidence of the training set and test set were 54.2% (n = 249) and 52.6% (n = 242), respectively. The calibration plot of the original model in the test set is shown in Figure 1.
Participants’ Characteristics in the Split Training Set and the Test Set.
Abbreviations: BMI, body mass index; DBP, diastolic blood pressure; SBP, systolic blood pressure; WHR, waist hip ratio.

Calibration plot of the original model to detect pre-diabetes mellitus and diabetes mellitus on the test set (N = 460). The x-axis is the predicted risk estimates of pre-DM/DM, and the y-axis is the observed case incidence. The curves were fitted based on restricted cubic splines. At the bottom of the graphs, histograms of the predicted risks are shown for the participants with (1) and without (0) pre-DM/DM.
Table 3 summarizes the updates applied to each recalibrated/updated model and their respective predictive performance in the test set. While all updated models improved calibration, there was no significant difference in the discrimination between any of the 7 updated models (AUC-ROC: 0.742-0.750) and the original (AUC-ROC: 0.746). At the study’s proposed sensitivity of 75%, 17 both of the recalibrated models (based on methods 1-2) had specificities, positive predicted values and negative predicted values of 0.60, 0.67, 0.68, respectively, which were identical to the original model as the recalibration methods did not update any coefficients of the prediction algorithm of the model. At a sensitivity of 75%, the specificities, positive predicted values, and negative predicted values of the updated models (based on method 3-7) ranged from 0.57 to 0.61, 0.66 to 0.68, and 0.68 to 0.69, respectively, which were not statistically different to that of the original model.
Model Discrimination Performance of Models Obtained by Different Recalibration/Update Methods in the Test Set (N = 460).
The calibration plots of the recalibrated models (methods 1 and 2) had an intercept and slope of −0.01 [−0.22, 0.20] and 0.69 [0.51, 0.87], and −0.02 [−0.22, 0.18] and 0.88 [0.65, 1.10], respectively (Figure 2a and b). In method 3, we found that WHR and age had significantly stronger effects on predicting pre-DM/DM risks in the test set. Thus, on top of applying the overall update coefficient (βoverall: 0.364), an additional revision factor (γWHR: 7.323, γAge: 0.024) was applied to their coefficients, respectively. The calibration plot of the updated model 3 showed an intercept and slope of −0.07 [−0.28, 0.14] and 0.88 [0.67, 1.09] (Figure 2c). In method 4, which re-estimated the coefficients of all predictors without selection, we noted that the coefficients of most predictors were different from their respective original values except for weekly vigorous recreational activity time. The intercept and calibration slope of the updated model 4 were −0.08 [−0.29, 0.13], 0.84 [0.64, 1.04], respectively (Figure 2d).

Calibration plots of the recalibrated/updated models to detect pre-diabetes mellitus and diabetes mellitus on the test set (N = 460). The x-axis is the predicted risk estimates of pre-DM/DM, and the y-axis is the observed case incidence. The curves were fitted based on restricted cubic splines. At the bottom of the graphs, histograms of the predicted risks are shown for the participants with (1) and without (0) pre-DM/DM. (a) Recalibrated model in method 1. (b) Recalibrated model in method 2. (c) Revised model in method 3. (d) Revised model in method 4. (e) Extended model (a) in method 5. (f) Extended model (b) in method 5. (g) Extended model (a) in method 6. (h) Extended model (b) in method 6. (i) Extended model in method 7.
In methods 5 to 7, we applied additional predictors, namely, the presence/absence of a family history of DM and/or weekly vegetable consumption, to extend the original model. The additional predictors did not contribute any significant additional effects on predicting the absolute pre-DM/DM risks in all extended models. In method 5, we added the additional predictor 1 at a time to the updated model 3 to create 2 extended models (model 5a and b). The calibration plots of models 5a and 5b indicated intercepts and slopes of −0.07 [−0.28, 0.14], −0.06 [−0.27, 0.15], and 0.88 [0.67, 1.09], 0.87 [0.66, 1.07], respectively (Figure 2e, Model 5a: extended model with the presence/absence of a family history of DM, and Figure 2f, Model 5b: extended model with weekly vegetable consumption). In method 6, we added the additional predictor 1 at a time to the updated model 4 to create 2 extended models (model 6a and 6b), which resulted in calibration plots that indicated intercepts and slopes of −0.08 [−0.29, 0.13], −0.06 [−0.27, 0.15], and 0.84 [0.64, 1.04], 0.82 [0.63, 1.02], respectively (Figure 2g, Model 6a: extended model with the presence/absence of a family history of DM, and Figure 2h, Model 6b: extended model with weekly vegetable consumption). In method 7, we included both additional predictors into the model and re-estimated the coefficients of all the original predictors. The intercept and calibration slope of the extended model in method 7 were −0.06 [−0.27, 0.15] and 0.82 [0.63, 1.02], respectively (Figure 2i).
Discussion
This study assessed whether recalibrating/updating the HK Chinese non-laboratory-based LR risk prediction model could improve its performance, for example, calibration and discrimination, in a PC population in Hong Kong. We found that a simple recalibration of the model’s regression constant was sufficient to improve the calibration, that is, the accuracy in estimating the absolute pre-DM/DM risks for individuals in the PC population. It should be noted that re-estimating the regression coefficients or extending the model with additional predictors, including a family history of DM and weekly consumption of vegetables, did not improve its calibration any more than that obtained by simple model recalibration. Furthermore, we demonstrated the robustness of the original model’s validity in terms of its discrimination (AUC-ROC) between pre-DM/DM cases and non-cases, which did not change significantly despite different update methods.
Our findings support those reported by Masconi et al, 25 where model recalibration was sufficient to improve the calibration of 5 existing DM models. The same study also found that total re-estimation of all the regression coefficients could not improve the models’ performance any further, 25 as, potentially, the data from the external populations could not add more information to the predictor-outcome associations established in the original development population. In contrast, Xu et al 26 reported a significant improvement in discrimination when they revised all the coefficients of the well-established Framingham DM risk model according to an older Chinese adult population. Findings from previous studies seem to indicate that when a model is applied to an external population within the same culture as the development population, the intercorrelation of effects of the predictors to outcome could remain relative and significant25,26 and extensive update methods might not be needed to improve the model’s performance. However, if the external population had vast cultural or racial differences from the development population, differences in genetic predisposition related to pre-DM/DM could undermine the relative effects of the predictors of outcome, and more extensive update methods to re-estimate the coefficients of the predictors might be needed. Since the development and external validation population were both derived from the Hong Kong Chinese population in our case,5,17 we demonstrated that simple recalibration to adjust to the difference in incidences was the most adequate method to adapt the HK Chinese risk model for application in the Hong Kong primary care setting.
We noted that including additional DM risk factors, that is, a family history of DM and weekly vegetable consumption, did not significantly improve the model’s performance. The variability of these risk factors in the split training set might not be large enough to contribute a significant prediction effect to the outcome. This result aligns with a previous study conducted by Simmons et al, 27 which also found no improvement in prediction accuracy when additional dietary predictors were included to extend an existing DM risk model. Another potential explanation is that several metabolic risk factors found to be strongly associated with pre-DM/DM were already included as predictors in the original model, for example, age, BMI, and WHR, and could have dominated the potential effects of the additional factors on predicting pre-DM/DM risk. Also, since fruit consumption was already included as a predictor, the lack of improvement when adding vegetable consumption to the model could be related to the multicollinearity effect of these 2 factors. When developing the original LR model, Dong et al 5 applied the assessment of multicollinearity and bidirectional stepwise consideration of factors based on the Akaike information criterion (AIC), which helped to ensure the robustness of the model. Furthermore, the development population was derived from a sizeable population-representative dataset. This provided sufficient power for precise estimates of the association coefficients between the predictors and pre-DM/DM risk, 5 thereby supporting its generalisability to an external local population.
Given the robustness of the model, we confirmed that the original model was valid for screening to differentiate the high-risk individuals from low-risk individuals for further testing that confirms the diagnosis of pre-DM/DM in the Hong Kong Chinese PC population. The re-calibrated HK Chinese risk prediction model could also be used to accurately estimate the absolute pre-DM/DM risks for individuals presenting to PC, who tend to have a higher prevalence of pre-DM/DM than the general population. The HK Chinese pre-DM/DM risk prediction model is novel in that it includes non-laboratory-based lifestyle predictors, for example, sleep duration, fruit consumption and amount of weekly vigorous activity, 5 which emphasizes the behavior-disease-risk link of pre-DM/DM, thereby motivating individuals to adopt positive lifestyle behavioral changes to prevent DM progression. Individuals could also use the estimated absolute risk predicted by the recalibrated model to assess and monitor their efforts in behavioral changes. Based on the self-regulation model and the attribution theory, individuals would perceive their actions as more effective if they observe an agreement on the changes they have made with their anticipated effects. 28
There were a few limitations in this study. First, we deployed convenience voluntary sampling that included self-referral and snowball sampling during participant recruitment in the validation study. This method could have attracted individuals with higher pre-DM/DM risks to participate. Thus, the study population’s incidence might not accurately represent the actual incidence of pre-DM/DM in primary care in Hong Kong. Second, we could not access the original development PHS 2014/15 dataset and, therefore, could not combine the development and validation population to re-estimate the coefficients in the updated models (method 4), as recommended by Janssen et al. 22 Third, the diagnosis of pre-DM/DM was based on 1 single blood test of OGTT and blood HbA1c level in our study, which could have overestimated the case incidence. Also, as the PHS 2014/15 did not include postprandial blood glucose level by OGTT for case definition, 4 the current definition might have overestimated the incidence in our study. Nonetheless, the incidence in the study population remained high (48.42%; n = 445) when we used the same case definition that PHS 2014/15 used. 4 A repeat test within 1 month may be considered to confirm the diagnosis in future studies, but this may increase the burden on the participants and research resources. Fourth, the model recalibration/update results were specific to the Hong Kong Chinese PC population and may not be generalizable to Chinese populations in other parts of the world due to potential lifestyle and environmental differences. Further recalibration/update of the HK Chinese non-laboratory-based risk model should be carried out before its application to other Chinese populations.
Our study found that simple recalibration was sufficient to improve the model’s calibration accuracy but did not significantly improve discrimination in case finding of pre-DM/DM. We conclude that the HK Chinese non-laboratory-based pre-DM/DM risk prediction model can be used as a first-step screening tool. With its dichotomous prediction outcome of a likely versus unlikely case of pre-DM/DM, it can help to identify high-risk individuals for further blood tests to detect pre-DM/DM in asymptomatic Chinese adults presenting to primary care. In contrast, by taking the high prevalence of pre-DM/DM in primary care into consideration, the absolute risk levels estimated by the recalibrated HK Chinese pre-DM/DM risk prediction model can serve as a reliable non-laboratory-based measure to monitor changes in individuals’ risk level of pre-DM/DM over time or following interventions.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the Health and Medical Research Fund, Health Bureau, Government of the Hong Kong Special Administrative Region (reference number: 17181641). The funding organisation did not play any role in the design and conduct of the study, collection, management, analysis, interpretation of the data, or manuscript preparation.
