Abstract
AI applications in finance including those for the probability of default modeling largely involve using ML classification tools. Oversampling the very minor (very underrepresented) class of defaulted borrowers seems to be a must-be-done step always. However, by crunching more than a thousand of confidence intervals for the classification accuracy metrics, we demonstrate when such oversampling is worth engaging in. Moreover, we argue to what portion of total initial sample size such oversampling should be carried out. Our findings are valuable primarily for the credit risk modeling and Internal Ratings Based (IRB) banks, but are not limited to those and have general applications for the binary classifications in ML domain.
Get full access to this article
View all access options for this article.
