Overfitting is a common problem in the development of predictive models. It leads to an optimistic estimation of apparent model performance. Internal validation using bootstrapping techniques allows one to quantify the optimism of a predictive model and provide a more realistic estimate of its performance measures. Our objective is to build an easy-to-use command, bsvalidation, aimed to perform a bootstrap internal validation of a logistic regression model.
A multivariable predictive model is a mathematical equation that relates multiple predictors for a particular individual to the probability of future occurrence of an outcome (Royston et al. 2009). Overfitting is a common problem in the development of these models, and it usually yields an overly optimistic model performance (Steyerberg 2009). In this context, internal validation is essential to provide a more realistic estimate of model ability to predict the risk of the outcome in a new subject. Several solutions have been proposed to correct for this optimism (sample splitting, cross-validation, and its variants leave-one-out cross-validation or leave-pair-out cross-validation). Among these strategies, bootstrapping emerges as a popular strategy to correct for optimistic estimates of the apparent performance.
The transparent reporting of a multivariable prediction model for an individual prognosis or diagnosis (TRIPOD) statement is an evidence-based guide of recommendations to standardize reporting of predictive models. The TRIPOD statement recommends bootstrapping techniques to carry out internal model validation and shrinkage methods to adjust overfitted models (Moons et al. 2015; Collins et al. 2015).
Our objective is to develop a new command, bsvalidation, to perform internal model validation using bootstrapping techniques that is executable as a postestimation command after the logistic or logit command. Stata has implemented postestimation commands to assess the apparent performance of the model. First, it has implemented the lroc postestimation command to assess model discrimination. It also has implemented estat gof to assess model calibration with a Hosmer–Lemeshow test. To the best of our knowledge, there is no user-defined internal validation command implemented in Stata to date such as the one we are presenting.
2 Methods
bsvalidation needs to be executed after logistic or logit. The command allows one to estimate different performance measures in terms of overall model fit performance (that is, how close our predictions are to the actual outcome, related to the amount of variability that is explained); discrimination (that is, how well the model distinguishes between those with and without the outcome); and calibration (that is, how well predictions and observations agree). These measures can be observed in table 1.
Ideal value: 1 E:O < 1 indicates the model underestimates for the total number of events. E:O > 1 indicates the model overestimates for the total number of events.
Calibration-in-the-large (CITL)
Ideal value: 0CITL < 0 indicates the predictions are systematically too high.CITL > 0 indicates the predictions are systematically too low.
Calibration slope
Ideal value: 1Slope < 1 indicates the predictions are too extreme and the model is overfit.Slope > 1 indicates the predictions are not varied enough and the model is underfit.
note: Brierscaled = 1 – Brierscore / Briermax
After the user has fit a logistic predictive model in the original sample using either the logit or logistic command, the validation command goes over the following algorithm:
It determines its apparent performance in the original sample (table 1).
It draws a bootstrap sample with replacement from the original sample.
It builds a new prediction model (bootstrap model) replicating the same modeling strategy used in the model that is being validated, and it determines its apparent performance in the bootstrap sample (bootstrap performance). If the original model is prespecified (that is, fit without variable selection), bsvalidation uses original model specification without any strategy for variable selection.
It applies the bootstrap model to the original sample to determine its performance (test performance).
It calculates the model’s optimism as the difference between the bootstrap performance and the test performance.
It repeats steps 2–5 a user-defined number of times to obtain a stable averaged estimate of the optimism.
Finally, it subtracts the averaged optimism estimate obtained in step 6 from the initial apparent performance estimated in step 1 to obtain the optimism-corrected performance estimate.
Our bsvalidation command also generates a calibration plot. Calibration is assessed using a lowess smoother function of predicted and observed risks for the overall sample. It also presents pairs of predicted and observed risks for groups defined by the user according to quantiles of predicted risk.
3 The bsvalidation command
3.1 Syntax
The syntax for bsvalidation is
bsvalidation [ varlist ] [ ,options ]
If the final model was prespecified, varlist will be empty. If the model was built using selection methods (backward, forward, or stepwise), those predictors previously assessed but excluded from the final model during the selection process should be included in varlist.
3.2 Options
reps(#) specifies the number of bootstrap samples. The default is 50 samples. If you are using Stata/IC, up to 800 bootstrap samples are supported. See help limits.
rseed(#) sets the random-number seed. This option can be used to obtain reproducible results. rseed(#) is equivalent to typing set seed# prior to calling bsvalidation.
adjust(string) displays the final model after applying a uniform shrinkage factor to the regression coefficients. string is one of the following:
bootstrap—uniform bootstrap shrinkage parameter from Steyerberg (2009).
pr(#) and pe(#) specify the significance level threshold for variables to be removed from or entered into the model, respectively.
pr(#) is backward elimination. Variables with p-value ≥pr() are eligible to be removed.
pe(#) is forward selection. Variables with p-value <pe() are eligible to be entered.
pr(#) and pe(#) indicate backward stepwise.
When a predictor-selection approach is considered, a backward elimination strategy is generally preferred (Harrell 2015).
Furthermore, bsvalidation displays the times each variable is selected in the final model after applying the same selection strategy for each bootstrap sample. Other variable-selection strategies such as lasso (least absolute shrinkage and selection operator) are not included in bsvalidation. See help lasso.
models displays the final model for each bootstrap sample. If the final model is prespecified, this option does not apply.
eform causes the coefficient table to be displayed in exponentiated form: for each coefficient, exp(b) rather than b is displayed. Standard errors and confidence intervals are also transformed.
graph produces a calibration plot of observed against expected probabilities. Calibration is plotted in groups across the risk spectrum. Confidence intervals for the groupings are displayed as well as a lowess smoother.
This allows one to assess the calibration at the individual level. If adjust() is considered, then the calibration plot will be adjusted.
group(#) specifies the number of percentiles to divide the predicted risks into. The default is to divide the predicted risks into 10 equally sized groups.
min(#) allows one to fix a lower bound of observed and expected probabilities to be plotted.
If min() is higher than the minimum probability predicted by the model, it is automatically rounded to the nearest first decimal to minimum.
max(#) allows one to fix an upper bound of observed and expected probabilities to be plotted.
If max() is lower than the maximum probability predicted by the model, it is automatically rounded to the nearest first decimal to maximum.
3.3 Stored results
bsvalidation stores the following in e():
4 Examples
We illustrate the use of bsvalidation with a predictive model developed to estimate the risk of low birthweight using the dataset lbw.dta from Hosmer, Lemeshow, and Sturdivant (2013).
In the first example, the command bsvalidation runs a bootstrap internal validation of a prespecified model.
Calibration plot
In this first example, we fit a prespecified logistic model to predict the risk of low birthweight (defined as birthweight lower than 2,500 grams), using the mother’s age (age), weight at last menstrual period (lwt), race (race), smoking status during pregnancy (smoke), previous history of premature labor (ptl), hypertension (ht), and uterine irritability (ui) as predictors. The bsvalidation output shows all apparent performance statistics (for example, C-statistic = 0.746). These performance measures are then adjusted for the estimated optimism, which is calculated from 50 (the default number) bootstrap samples (for example, C-statistic = 0.694). Additionally, by using the graph option, we visualize a calibration plot of observed against expected risks of low birthweight in groups defined by deciles of predicted risk, along with a smooth fitted line. Further, it shows scatterplots with the distribution of events (x symbol) and nonevents (hollow circle symbol) along the x axis.
In the second example, bsvalidation performs a bootstrap internal validation of a model that was previously built using a backward-selection strategy with significance level (p = 0.1). After the backward-selection strategy, the predictors age and ptl were dropped. The model coefficients are finally adjusted by the bootstrap-estimated uniform shrinkage factor or coefficient.
In the second example, the model is built using a backward-selection strategy in the original data. The predictors selected in the process are lwt, race, smoke, ht, and ui (logistic command). Other candidate predictors (age and ptl) initially assessed, but excluded during the selection process, are added in the varlist of the bsvalidation command to replicate the same modeling strategy used during the development of the original model. The output shows both apparent and optimism-adjusted performance measures. Additionally, because the backward-selection strategy is replicated in each bootstrap sample, the output also shows the number of times each predictor is selected in the final model (that is, lwt was included in 75 out of 100 bootstrap models). Finally, the coefficients of the final model are adjusted by bootstrap-based uniform shrinkage to correct overfitting. Thus, coefficients are multiplied by 0.712.
5 Conclusion
bsvalidation is a useful command to run bootstrap internal validation of predictive logistic regression models. It makes this internal validation method more accessible to researchers promoting a more complete and better report of predictive models according to TRIPOD guidelines.
6 Limitations
Although bsvalidation helps standardize the internal validation process, a disadvantage of bootstrap validation is that it allows validation only of models built following fixed or automated modeling strategies (that is, without dynamic modeling strategies or stepwise modeling strategies). Other important steps during the modeling process, such as collapsing factor variables, assessing nonlinearities, or testing for interaction terms, cannot be handled by bsvalidation. The command does not handle other shrinkage methods, such as the least absolute shrinkage and selection operator (Tibshirani 1996), and cannot handle missing values.
7 Future works
In the future, we will work to solve some of the previously mentioned limitations, and we will evolve the command to validate other regression models commonly used in biomedical research, such as Cox regression.
Supplemental Material, sj-zip-1-stj-10.1177_1536867X211025836 for Bootstrap internal validation command for predictive logistic regression models by B. M. Fernandez-Felix, E. García-Esquinas, A. Muriel, A. Royuela and J. Zamora in The Stata Journal
Footnotes
8 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
References
1.
CollinsG. S.ReitsmaJ. B.AltmanD. G.MoonsK. G. M.2015. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. British Medical Journal 350: g7594. https://doi.org/10.1136/bmj.g7594.
2.
EnsorJ.SnellK. I. E.MartinE. C.2018. pmcalplot: Stata module to produce calibration plot of prediction model performance. Statistical Software Components S458486, Department of Economics, Boston College. https://EconPapers.repec.org/RePEc:boc:bocode:s458486.
3.
HarrellF. E.Jr.2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed. Cham, Switzerland: Springer.
MoonsK. G. M.AltmanD. G.ReitsmaJ. B.IoannidisJ. P. A.MacaskillP.SteyerbergE. W.VickersA. J.RansohoffD. F.CollinsG. S.2015. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Annals of Internal Medicine162: W1–W73. https://doi.org/10.7326/M14-0698.
6.
RileyR. D. A.van der WindtD.CroftP.MoonsK. G. M.2019. Prognosis Research in Health Care: Concepts, Methods, and Impact. New York: Oxford University Press. https://doi.org/10.1093/med/9780198796619.001.0001.
7.
RoystonP.MoonsK. G. M.AltmanD. G.VergouweY.2009. Prognosis and prognostic research: Developing a prognostic model. British Medical Journal338: b604. https://doi.org/10.1136/bmj.b604.
8.
SteyerbergE. W.2009. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Cham, Switzerland: Springer.
9.
SteyerbergE. W.VickersA. J.CookN. R.GerdsT.GonenM.ObuchowskiN.PencinaM. J.KattanM. W.2010. Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology21: 128–138. https://doi.org/10.1097/EDE.0b013e3181c30fb2.
Van HouwelingenJ. C.Le CessieS.1990. Predictive value of statistical models. Statistics in Medicine9: 1303–1325. https://doi.org/10.1002/sim.4780091109.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.