A novel ensemble model of multi-class credit assessment based on multi-source fusion theory

Abstract

With the development of artificial intelligence technology, the assessment method based on machine learning, especially the ensemble learning method, has attracted more and more attention in the field of credit assessment. However, most of the ensemble assessment models are complex in structure and costly in time for parameter tuning, few of them break through the limitations of lightweight, universal and efficient. This paper present a new ensemble model for personal credit assessment. First, considering the conflicts and differences among multiple sources of information, a new method is proposed to correct the category prior information by using the difference measure. Then, the revised prior information is fused with the current sample information with the help of Bayesian data fusion theory. The model can integrate the advantages of multiple benchmark classifiers to reduce the interference of uncertain information. To verify the effectiveness of the proposed model, several typical ensemble classification models are selected and empirically studied using real customer credit data from a commercial bank in China, and the results show that among various assessment criteria: the proposed model not only effectively improves the multi-class classification performance, but also outperforms other advanced multi-class classification credit assessment models in terms of parameter tuning and generalizability. This paper supports commercial banks and other financial institutions examination and approval work.

Keywords

Ensemble model multi-class credit assessment information fusion theory

1 Introduction

With the continuous development of commercial bank credit business, credit risk assessment has become the focus of attention. It is an effective way for banks and other financial institutions to identify defaulting customers and potential value customers by a scientific and effective credit assessment model. And it is also important for banks and other financial institutions to avoid risks by assessing the credit risk level of customers timely and accurately [1].

There are many studies on credit risk assessment, mainly based on the customer’s historical credit record, with the help of classification method to build a credit rating assessment model, so as to obtain the probability of default risk of the customer [2]. Fernandes et al. [3] judged whether a customer is in default or not by logical regression (LR) [3]. Although the LR model is simple and easy to understand, the assessment accuracy is not high, and the statistical assessment model usually had strong assumptions [4, 5].

With the growing demand for credit, artificial intelligence machine learning algorithm breaks the limitations of traditional statistical methods on data distribution assumptions [6, 7]. Akkoc proposed a fuzzy neural network classification model, which is superior to traditional assessment models such as linear discriminative analysis (LDA) and LR in terms of estimating classification error cost and correct classification rate [8]. Harris et al. [9] constructed an enhanced binary classification model of support vector machine, which can effectively handle high-dimensional data, with better classification accuracy than statistical assessment and also reduces the computational cost. However, the data structure of credit risk in different environments is complex in real life. A single classifier is only applicable to a certain type of data, and cannot solve all data situations well. Most of the models can only meet the current credit assessment situation and can hardly be universally applicable to other different credit datasets [10, 11].

And the ensemble classification algorithm greatly improves this defect [12, 13]. Zhu et al. [14] compared and analyzed six methods of credit risk assessment with the help of financial data of enterprises, and concluded that ensemble learning method has higher classification performance. Chen et al. [15] integrated multiple classification models based on support vector machines by adjusting kernel function and penalty factor parameters to improve the classification accuracy. Xia et al. [16] proposed a tree based heterogeneous ensemble credit binary classification assessment model with the main advantage of dynamically assigning weights to benchmark classifiers based on over fitting measures.

In the classification assessment of bank credit rating, more and more scholars pay attention to the method of ensemble classification, and has made some progress [17]. Since the credit rating of bank customers is affected by many factors, the shortcomings of the existing ensemble assessment models are also obvious: First, in order to improve classification accuracy, complex assessment frameworks are often built or a lot of parameter tuning is performed, which is time-consuming and less interpretable. Second, most of the existing ensemble classification assessment methods are based on the same type or different types of benchmark classifiers, and rarely consider the differences between classifiers and the unique performance of a single classifier [18, 19].

To improve this shortcoming, an ensemble model of multi-class classification credit assessment based on Bayesian fusion theory is proposed, which takes into account the conflicts and differences of multi-source information, and integrates the advantages of several different benchmark classifiers, including LR,DT, SVM, KNN, RF and XGboost. With the help of Bayesian fusion theory, to achieve an optimal combination of the differences between various benchmark classifiers and the individual performance of a single classifier. The model makes full use of the difference and complementarity between multiple sources of information to reduce the interference of uncertainty information of a single classifier, thus improving the classification accuracy. The model classification after Bayesian fusion has higher accuracy and stronger stability, and also has high generalizability.

The main contributions of this study are as follows: (1) A new multi-class classification credit assessment ensemble model is constructed with a view to providing a more refined five-level classification assessment compared to most binary studies; (2) An innovative fusion of five different single classifiers with the help of Bayesian theory, which makes full use of a priori information, which takes into account the conflicts and differences of multi-source information, integrates the advantages of each classifier and reduces the uncertainty of a single classifier, thus improving the overall assessment performance of the model; (3) The multi-class classification assessment model constructed in this paper is more lightweight, and the benchmark classifiers adopt basically conventional default parameters, which reduces the time cost of parameter tuning and greatly improves the operability and efficiency of the model. To further verify the effectiveness of the proposed model, three different representative ensemble models are selected for comparison, and the comparison results show that the superior classification performance competence of the proposed model.

The remainder of this paper is organized as follows. Section 2 presents related work. Subsequently, Section 3 describes the details of the proposed model. Section 4 presents the experimental design, which mainly includes dataset description, data preprocessing, performance metrics, and the implementation and assessment details. Section 5 presents an analysis of the experimental results. The conclusion is provided in Section 6.

2 Related work

2.1 Classification model

In this section, we mainly describe the advantages and disadvantages of the chosen basic models and present the underlying principles of these models. Our goal is to construct an efficient, general and lightweight multi-class credit assessment model. The model needs to realize the integration optimization of the differences between various benchmark classifiers and the individual performance of a single classifier, which has better classification accuracy and stronger generalization ability compared with the single classifier that is more sensitive to data. In order to meet the research requirements, we select six different benchmark classification models, namely logistic regression(LR) [8], decision tree (DT) [20], support vector machine (SVM) [21], K-nearest neighbor (KNN) [22], random forest (RF) [23] and XGboost [24]. Because the logistic regression model is a widely used statistical modeling technique, and the most important feature of this model is that it is highly interpretable relative to other multi-classification algorithms (e.g., SVMs, neural networks, RF,etc.), DT is a classical and commonly method, SVM has good robustness and generalization ability, KNN is simple and effective [25], RF algorithm has good processing capabilities for unbalanced datasets, XGboost is more flexible and efficient, with good computing performance [26].

2.2 Information fusion theory

Information fusion, also known as data fusion, first originated from military applications in the 1970 s. It is the combination or integration of information or data from multiple sources at different levels of abstraction to eventually obtain more complete, reliable and accurate information or inference [27]. Data fusion in the research field of credit rating classification is mainly a classification method that considers the combination or integration of complex and diverse information from different sources [28]. On the premise of ensuring sufficient and effective information, the redundancy of information should be reduced as far as possible to improve the classification accuracy. Among the information fusion technology, Bayesian fusion theory is widely used in information synthesis for its superiority in providing reliable prior information [29, 30].

Bayesian theory is a statistical theorem about probability theory proposed by Thomas Bayes, an English mathematician in the 18th century. With the development of computers, Bayes’ theorem has been widely used in many fields. Since Bayes’ theorem requires large scale data computational reasoning to highlight the effect, it is favored by more researchers in the context of big data era [31, 32], and its basic idea is all about finding the result that maximizes the posterior probability [33].

Definition 1: A, B are two events and the conditional probability of event A occurring given that event B has occurred, denoted as P (A|B):

$P (A | B) = \frac{P (AB)}{P (B)}$ (1)

P (AB) denotes the probability of simultaneous occurrence of events A, B. If event A and event B are two events that are independent of each other, then: $P (A | B) = \frac{P (AB)}{P (B)} = \frac{P (A) * P (B)}{P (B)} P (A)$ (2)

The derivation process above proves that if A and B are mutually independent events, then the probability of event A occurring is independent of event B. The formula can be transformed into: $P (AB) = P (B) * P (A | B) = P (A) * P (B | A)$ (3)

Definition 2: Let S be the test sample space of Ω, B₁, B₂, ⋯ , B_i is a set of events for Ω. When $\cup_{i = 1}^{\infty} B_{i} = S$ , B_iB_j = ∅ , (i ≠ j ; i, j = 1, 2, ⋯), then B₁, B₂, ⋯ is said to be a division or partition of the sample space S, and (B_i)> 0, i = 1, 2, ⋯, then the full probability formula for any event A is: $P (A) = \sum_{i = 1}^{\infty} P (B_{i}) P (A | B_{i})$ (4)

Definition 3: On the basis of conditional probability and full probability, the Bayesian formula is thus derived: $P (B_{i} | A) = \frac{P (B_{i}) P (A | B_{i})}{\sum_{i = 1}^{n} P (B_{j}) P (A | B_{j})}$ (5)

3 Proposed methodology

3.1 Bayesian data fusion theory

Bayesian data fusion theory is a common reasoning method based on probability statistics. It is mainly based on Bayesian theorem to determine the prior distribution according to the existing category prior information. After obtaining the sample information, the prior information is fused with the current sample information, so as to obtain the posterior distribution of the category population and realize the inference of the sample category population. In this process, the uncertainty is described by probability, and the sample to be classified is assigned to the category to which it is most likely to belong or to the category with the least expected risk. Compared with other estimation methods, the greatest advantage of Bayesian theory is that it makes full use of prior information [34].

In this study, the classification accuracy of each classifier output is taken as one information source, and according to Bayesian fusion theorem, when only two information sources S₁ and S₂ are considered, the fusion rule of the two information sources can be expressed in the following form: $P (C_{i} | S_{1}, S_{2}) = \frac{P (S_{1}, S_{2} | C_{i}) P (C_{i})}{\sum_{j = 1}^{n} P (S_{1}, S_{2} | C_{j}) P (C_{j})}$ (6)

Where C_i is the ith category, P (C_i) is also the prior probability distribution of the ith category, and P (S_k|C_i) , (i = 1, 2, …, 5 ; k = 1, 2, …, m .) represents the conditional probability obtained by the kth information source in the ith category. Consider that the probability distribution of each information source is independent of each other. Therefore, Equation (6) can be expressed as: $P (C_{i} | S_{1}, S_{2}) = \frac{P (S_{1} | C_{i}) P (S_{2} | C_{i}) P (C_{i})}{\sum_{j = 1}^{n} P (S_{1} | C_{j}) P (S_{2} | C_{j}) P (C_{j})}$ (7)

By the same inference, when M information sources are considered, the fusion rule of M information sources can be expressed as: $P (C_{i} | S_{1}, S_{2}, \dots, S_{M}) = \frac{P (C_{i}) \prod_{k = 1}^{M} P (S_{k} | C_{i})}{\sum_{j = 1}^{N} P (C_{j}) \prod_{k = 1}^{M} P (S_{k} | C_{j})}$ (8)

Compared with other fusion methods, Bayesian fusion can make full use of prior information. In view of the conflicts and differences existing in multi-source information, the distribution of category prior information is modified according to the difference measure, and the prior information is fused with the current sample information by means of Bayesian data fusion theory, so as to obtain the posterior distribution of category population, enrich the completeness of information and reduce the overall uncertainty.

3.2 Proposed multi-class classification credit assessment ensemble model based on Bayesian information fusion (BIF-MCCA)

Due to the differences among different benchmark classifiers, the resulting information sources also have differences and conflicts. In order to effectively realize the integration optimization of the differences between various benchmark classifiers and the individual excellent performance of a single classifier. This study constructs a multi-class classification credit assessment ensemble model based on Bayesian information fusion (BIF-MCCA), considering the differences among multiple information sources. The model uses the difference measure to modify the category prior information, and then fuses the revised prior information with the current sample information with the help of Bayesian data fusion theory, thus integrating the advantages of multiple benchmark classifiers and reducing the interference of uncertain information. The benchmark classifiers selected in this study are all based on default parameters, which reduces the time cost of parameter tuning compared with other ensemble models. The BIF-MCCA model is concise and lightweight in structure, with higher time efficiency and lower spatial complexity. The flow chart of the BIF-MCCA model in this paper is shown in Fig. 1. The calculation process is as follows:

Fig. 1

Flow chart of BIF-MCCA algorithm.

Step 1: Make general estimation of category prior probability distribution P (C_i) in the training sample set by using bank customer credit data, where C_i (i = 1, 2, …, 5) represents five credit categories, the prior probability distribution of P (C_i) can generally be estimated by the proportion of the number of samples of each category in the total number of samples, that is, P (C_i) = n_i/n, where n_i is the number of C_i category samples in the training set.

Step 2: Bring the training sample set into the benchmark classifier, obtain the overall classification accuracy P (S_k) for each information source and the conditional probability P (S_k|C_i) for each information source under each category, and calculate the difference measure δ_i for different information sources according to Equation (9), where P (S_k) (k = 1, 2, …, 5) is the classification performance accuracy of the kth information source. $δ_{i} = \frac{P (S_{k})}{\sum_{j = 1}^{5} P (S_{j})}$ (9)

Step 3: The conditional probability of the same category is different on each information source, and it is obviously inappropriate to adopt equal weight fusion when integrating. Therefore, the distribution weight ɛ_i of the prior probability of each category is obtained by means of the difference measure of multi-source information, which fully reflects the difference of contribution of different information sources to each category prior information, and obtain the revised results of the estimated category prior probabilities $\hat{P} (C_{i})$ by Equation (11), where P (S_k|C_i) (k = 1, 2, …, 5;i = 1, 2, …, 5) is the classification precision of each classifier under each category. $ɛ_{i} = \frac{\sum_{i = 1}^{5} δ_{i} * P (S_{k} | C_{i})}{\sum_{i = 1}^{5} δ_{j} * P (S_{k} | C_{j})}$ (10) $\hat{P} (C_{i}) = ɛ_{i} * P (C_{i})$ (11)

Step 4: Fusion yields the posterior probability distribution of each credit rating category of the customer according to Equation (12). $\hat{P} (C_{i} | S_{1}, S_{2}, \dots, S_{M}) = \frac{\hat{P} (C_{i}) \prod_{k = 1}^{M} P (S_{k} | C_{i})}{\sum_{j = 1}^{N} \hat{P} (C_{j}) \prod_{k = 1}^{M} P (S_{k} | C_{j})}$ (12)

Step 5: Judge the credit rating to which the sample belongs by the probability value of each category after fusion.

4 Experimental design

4.1 Credit dataset description

The dataset used in this study is obtained from real information from an anonymous commercial bank in China, and all data are ensemble on customer personal loan application records, as shown in Table 1, where the commercial bank provides 27,522 credit data of bank customers’ personal loans for a total of 24 months from July 2018 to July 2020, with 23 features including target classification features and attribute descriptive features in each credit data. Among them, there are five levels of credit classification features, which are Normal Category (C1), Secondary Category (C2), Concern category (C3), Suspicious category (C4) and Loss category (C5). The attribute features mainly include 22 features of loan customers in four dimensions: personal information, credit information, loan information and guarantee information.

Table 1
Details of the credit dataset

Dataset Total features Total categories Five categories samples distribution Sample size

C1 samples C2 samples C3 samples C4 samples C5 samples

Credit 23 5 25516 1445 109 103 349 27522

Dataset	Total features	Total categories	Five categories samples distribution	Sample size
Credit	23	5	25516	1445	109	103	349	27522

Among the four dimensions, personal information includes five features:x₁:Customer ID, x₂:Industry Sector Engaged, x₃:Number of Houses, x₄:Month Property Costs, x₅:Family Monthly Income. Credit information includes three features:x₆: Whether Interest is Owed, x₇:Whether Devalue Account, x₈:Safety Coefficient. The loan information includes nine features:x₉: Type of Loan Business, x₁₀:Whether Self-service Loan, x₁₁:Date Code, x₁₂:Approval Deadline, x₁₃:Down Payment Amount, x₁₄:Whether Personal Business Loan, x₁₅:Installment Repayment Method (numerical type), x₁₆:Repayment Type, x₁₇:Installment Repayment Cycle (numerical type). Guarantee information includes:x₁₈:Guarantee the Balance, x₁₉:Account Connection Amount, x₂₀:Security Guarantee Amount, x₂₁:Type of Guarantee, x₂₂:Collateral Value (CNY), x₂₃: Guarantee Method.

4.2 Data preparation

The credit data information in the original data set provided by commercial banks is mostly incomplete, noisy and low-quality data, which is not conducive to direct data analysis. Therefore, after acquiring the data source, the data structure is first analyzed to screen out useless and duplicate redundant information. Next, data cleaning is performed, which mainly includes removing or filling missing values and outliers, and coding discrete variables with one hot, and due to the imbalance between the five categories of the original data [35], this paper uses the SMOTE algorithm, which is simple to implement, easy to understand and widely used, to deal with the imbalance of data [36, 37]. At the same time, the correlation analysis is also performed for credit attribute characteristics to test the rationality of indicator selection [38]. The final credit features after cleaning and normalization are shown in Table 2.

Table 2
Description of credit attribute characteristics of commercial bank customers

Variable Attribute

Personal information x₃:Number of Houses x₃ɛ (0, 2)

x₄:Month Property Costs x₄ɛ (0, 350000)

x₅:Family Monthly Income x₅ɛ (1, + ∞)

Credit information x₆:Whether Interest is Owed x₆=Y = arrears; x₆=N = no interest owed;

x₇: Whether Devalue Account x₇=Y denotes Yes; x₇ = N denotes No

x₈:Safety Coefficient x₈= 100,80,75,70,60,50

Loan information x₁₃:Down Payment Amount x₁₃ɛ (0, 7000000)

x₁₅:Installment Repayment Method x₁₅= 1 denotes Equal principal payments; x₁₅= 2 denotes Equal loan payments;

x₁₂: Approval Deadline x₁₂ɛ (1, 11000)

Guarantee information x₁₈:Guarantee the Balance x₁₈ɛ (0, 99000000)

x₂₂:Collateral Value (CNY) x₂₂ɛ (0, + ∞)

x₂₃: Guarantee Method x₂₃ = security, mortgage

Target variable Y:Five-Classification Y = Normal, Secondary, Concern, Suspicious, Loss

	Variable	Attribute
Personal information	x₃:Number of Houses	x₃ɛ (0, 2)
	x₄:Month Property Costs	x₄ɛ (0, 350000)
	x₅:Family Monthly Income	x₅ɛ (1, + ∞)
Credit information	x₆:Whether Interest is Owed	x₆=Y = arrears; x₆=N = no interest owed;
	x₇: Whether Devalue Account	x₇=Y denotes Yes; x₇ = N denotes No
	x₈:Safety Coefficient	x₈= 100,80,75,70,60,50
Loan information	x₁₃:Down Payment Amount	x₁₃ɛ (0, 7000000)
	x₁₅:Installment Repayment Method	x₁₅= 1 denotes Equal principal payments; x₁₅= 2 denotes Equal loan payments;
	x₁₂: Approval Deadline	x₁₂ɛ (1, 11000)
Guarantee information	x₁₈:Guarantee the Balance	x₁₈ɛ (0, 99000000)
	x₂₂:Collateral Value (CNY)	x₂₂ɛ (0, + ∞)
	x₂₃: Guarantee Method	x₂₃ = security, mortgage
Target variable	Y:Five-Classification	Y = Normal, Secondary, Concern, Suspicious, Loss

4.3 Performance metrics

Confusion matrix can comprehensively reflect the performance of the model, and many metrics can be derived from it. In order to verify the classification performance of the model, we choose four common classification assessment metrics: Accuracy, Precision, Recall, and F1-score. For common binary confusion matrix, personal credit of customers is divided into two categories: “good” and “bad". There are four types of personal credit of each customer during the binary classification, which are True Positive (TP), True Negative (TN), False Negative (FN), False Positive (FP), as shown in Table 3. For the multi-category credit assessment problem, each category should be regarded as “positive” separately, and the other categories as “negative"[39]. The metrics derived from the confusion matrix are defined as follows [40]:

Table 3
Confusion matrix

Actually Positive Actually Negative

Predicted Position True Positive False Positive

Predicted Negative False Negative True Negative

	Actually Positive	Actually Negative
Predicted Position	True Positive	False Positive
Predicted Negative	False Negative	True Negative

$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (13)

$Precision = \frac{TP}{TP + FP}$ (14)

$Recall = \frac{TP}{TP + FN}$ (15)

$F_{1} - score = \frac{2 * Precision * Recall}{Precision + Recall}$ (16)

$Recall = \frac{TP}{TP + FN}$ (17)

$F_{1} - score = \frac{2 * Precision * Recall}{Precision + Recall}$ (18)

Accuracy represents the overall performance of a classifier; Precision mainly reflects the reliability of the output of the classifier results; Recall reflects the coverage degree of classification effect; F1-score is a reliable comprehensive index for evaluating unbalanced data, and is the harmonic mean of recall and accuracy, assuming that both are equally important. Precision and recall are contradictory measures; in general, recall tends to be low when precision is high, and precision tends to be low when recall is high.

4.4 Implementation and assessment details

To enhance the robustness of the experiments, a hold-out validation strategy is used where the dataset is randomly divided into a testing set with the 20% of the total data and the remaining 80% is used as training data, which is further randomly separated into two parts: 90% is used as the training set and 10% is used as the validation set to perform 10-fold cross-validation [41, 42]. Finally, the final assessment is performed in a test set, which remains unused during the calibration process. All the experiments were conducted on a PC running Python Version 3.7 with 3.0 GHz Intel CORE i7 processor, 32 GB of RAM, and Microsoft Windows 10 operating system.

5 Empirical analysis and results

Our experiments have two objectives: First, to verify that compared with a single classifier, BIF-MCCA ensemble assessment model can effectively assess the credit risk level of commercial banks’ customers; Second, compared with other multi-class classification ensemble models, BIF-MCCA improves the accuracy, efficiency and generalization versatility of multi-class classification of commercial banks’ customer credit rating. Section 5.1 shows the assessment performance of basic classifiers, and Section 5.2 shows the assessment performance of BIF-MCCA and other multi-class classification ensemble models.

5.1 Baseline classifier results

This section aims to assessment the performance of five basic classifiers. In Section 5.1.1, we compared the model’s assessment performance results at five classification levels using precision, recall, and F1-score. In Section 5.1.2, the overall classification accuracy performance of the model is measured.

5.1.1 Assessment performance of the model at each classification level

The pre-processed data are substituted into each benchmark classification model and the BIF-MCCA model constructed in this paper to obtain the assessment standard results of precision, recall and F1-score values of each model in the five credit levels, as can be seen in Table 4 and Fig. 2.

Table 4
Precision, recall and F1-score of baseline classifiers and MIF-MCCA model under the five credit categories

Model/Categories C1 C2 C3 C4 C5

Panel A: Model performance in terms of Precision

Baseline Models LR 0.6399 0.3533 0.8755 0.6694 0.6237

DT 0.7876 0.5617 0.8407 0.6319 0.8168

KNN 0.6229 0.8338 0.7525 0.7702 0.7354

SVM 0.8665 0.6801 0.7647 0.7650 0.6718

RF 0.8329 0.7204 0.8627 0.7493 0.7583

XGboost 0.8769 0.8111 0.826 0.7676 0.6921

The Proposed BIF-MCCA 0.8995 0.8692 0.8879 0.8352 0.8371

Panel B: Model performance in terms of Recall

Baseline Models LR 0.8668 0.8162 0.7117 0.5280 0.4681

DT 0.8639 0.8229 0.794 0.7356 0.5478

KNN 0.7632 0.777 0.7207 0.7126 0.7372

SVM 0.8188 0.7965 0.8991 0.6526 0.6667

RF 0.8769 0.8171 0.8341 0.7885 0.6395

XGboost 0.8617 0.7951 0.8963 0.7313 0.7234

The Proposed BIF-MCCA 0.8784 0.8804 0.8981 0.8273 0.8316

Panel C: Model performance in terms of F1-score

Baseline Models LR 0.7363 0.4931 0.7852 0.5904 0.5348

DT 0.824 0.6677 0.8167 0.6798 0.6558

KNN 0.6859 0.8044 0.7362 0.7403 0.7363

SVM 0.8419 0.7337 0.8265 0.7043 0.6692

RF 0.8543 0.7657 0.8482 0.7684 0.6938

XGboost 0.6920 0.8030 0.8597 0.7490 0.7074

The Proposed BIF-MCCA 0.8888 0.8747 0.8929 0.8312 0.8343

Model/Categories		C1	C2	C3	C4	C5
Panel A: Model performance in terms of Precision
Baseline Models	LR	0.6399	0.3533	0.8755	0.6694	0.6237
	DT	0.7876	0.5617	0.8407	0.6319	0.8168
	KNN	0.6229	0.8338	0.7525	0.7702	0.7354
	SVM	0.8665	0.6801	0.7647	0.7650	0.6718
	RF	0.8329	0.7204	0.8627	0.7493	0.7583
	XGboost	0.8769	0.8111	0.826	0.7676	0.6921
The Proposed	BIF-MCCA	0.8995	0.8692	0.8879	0.8352	0.8371
Panel B: Model performance in terms of Recall
Baseline Models	LR	0.8668	0.8162	0.7117	0.5280	0.4681
	DT	0.8639	0.8229	0.794	0.7356	0.5478
	KNN	0.7632	0.777	0.7207	0.7126	0.7372
	SVM	0.8188	0.7965	0.8991	0.6526	0.6667
	RF	0.8769	0.8171	0.8341	0.7885	0.6395
	XGboost	0.8617	0.7951	0.8963	0.7313	0.7234
The Proposed	BIF-MCCA	0.8784	0.8804	0.8981	0.8273	0.8316
Panel C: Model performance in terms of F1-score
Baseline Models	LR	0.7363	0.4931	0.7852	0.5904	0.5348
	DT	0.824	0.6677	0.8167	0.6798	0.6558
	KNN	0.6859	0.8044	0.7362	0.7403	0.7363
	SVM	0.8419	0.7337	0.8265	0.7043	0.6692
	RF	0.8543	0.7657	0.8482	0.7684	0.6938
	XGboost	0.6920	0.8030	0.8597	0.7490	0.7074
The Proposed	BIF-MCCA	0.8888	0.8747	0.8929	0.8312	0.8343

Fig. 2

Precision, Recall and F1-score of baseline classifiers and MIF-MCCA model under the five credit categories.

In terms of precision, assessment criteria. In the C1 credit category, XGboost has the highest precision, followed by SVM and RF, then BIF-MCCA model, and KNN model has the lowest precision; In the C2 credit category, BIF-MCCA model has the highest precision, followed by KNN model and XGboost model, and the worst is SVM and DT, with only 0.6801 and 0.5617 precision respectively; In the C3 credit category, the BIF-MCCA model has the highest precision, followed by RF and DT, the lowest is KNN; In the C4 credit category, the precision of BIF-MCCA model is also the highest, followed by KNN and XGboost, and DT is the lowest. In the C5 credit category, the BIF-MCCA model still has the highest precision, followed by DT and RF, and the lowest is SVM. To sum up, the precision of the BIF-MCCA model is the highest in all four categories except for the C1 category, while the lowest precision of DT and KNN occurs most frequently, which shows that the precision of DT and KNN performs poorly in the five categories compared to other models.

In terms of recall assessment criteria, in the C1 credit category, the BIF-MCCA model has the highest recall, followed by RF and DT, and the lowest is KNN; In the C2 credit category, the BIF-MCCA model has the highest recall, the DT model ranks second, followed by RF, and the lowest is KNN; In the C3 credit category, BIF-MCCA model ranks fourth, SVM model has the highest recall, followed by XGboost and RF, and KNN has the lowest recall. In the C4 credit category, BIF-MCCA model has the highest recall, followed by RF and DT, and SVM has the lowest recall. In the C5 credit category, the recall of BIF-MCCA model is also the highest, KNN ranks the second, and DT is the lowest. To sum up, the recall of the BIF-MCCA model is the highest in all four categories except for the C3 credit category.

In terms of F1-score assessment criteria. In the C1 credit category, the BIF-MCCA model has the highest F1-score value, the RF model is in second place, the DT is in third place, and the KNN is the lowest; in the C2 credit category, the BIF-MCCA model also has the highest F1-score value, followed by KNN and XGboost, and the lowest is DT; In the C3 credit category, the XGboost has the highest F1-score, the RF is the second, the BIF-MCCA is the third and the lowest is KNN. In both the C4 and C5 credit category, the BIF-MCCA model has the highest F1-score value. To sum up, the F1-score value of the BIF-MCCA is the highest in all four categories except for the C3 credit category.

In summary, comparing the results of the benchmark classification model and the BIF-MCCA model on the three assessment criteria of Precision, recall and F1-score value, it can be seen that the BIF-MCCA model has the highest accuracy in categories C2, C3, C4 and C5, the highest recall and F1-score value in all four categories except for the C3 category. It can be seen that the BIF-MCCA model constructed in this paper has higher classification performance compared with other benchmark models in the five credit ratings.

5.1.2 Overall accuracy of the assessment model

In this subsection, the accuracy is used to measure the overall assessment performance of the model, and the overall accuracy of each model is shown in Table 5 and Fig. 3. It can be seen that compared with the accuracy of the multi-class classification assessment model of commercial bank customer credit constructed by LR, DT, KNN, SVM, RF and XGboost, Obviously, the credit assessment results obtained by the ensemble method has better performance than the benchmark models. The overall assessment accuracy of the BIF-MCCA model constructed in this paper is as high as 85.05%, which has been significantly improved. Compared with the LR model, the accuracy of the model improved the most, up to 35.36%, and compared with the better performance of the XGboost model, the accuracy is also improved by 6.62%.

Table 5
The overall accuracy of the personal credit assessment model

Model Type LR DT KNN SVM RF XGboost BIF-MCCA

Accuracy 63.21% 72.95% 74.15% 76.15% 78.6 % 80.25% 85.56%

Improvement Rate 35.36% 17.29% 15.39% 12.36% 8.85% 6.62% –

Model Type	LR	DT	KNN	SVM	RF	XGboost	BIF-MCCA
Accuracy	63.21%	72.95%	74.15%	76.15%	78.6 %	80.25%	85.56%
Improvement Rate	35.36%	17.29%	15.39%	12.36%	8.85%	6.62%	–

Fig. 3

Overall accuracy of credit assessment models.

5.2 Comparative analysis of BIF-MCCA with other classification models

We select several classical and commonly related assessment models for comparative analysis with the BIF-MCCA model. Among them, CatBoost is the classical ensemble credit assessment method. CFHM, OCHE and MIFCA are the latest ensemble credit assessment methods.

CatBoost is a powerful open-sourced GBDT-based technique that achieves promising results in a variety of machine learning [43]. CFHM is a fusion technique that considers intra-attribute and inter-attribute weight optimization and can be integrated using any of the benchmark classifiers [28], the effect of parameter tuning on the classification performance is investigated in the experiments, where the parameter γ ∈ [0,1], the accuracy varies with the step size of 0.1, and the parameter γ = 0.1 is finally taken as the default value. OCHE is a novel credit scoring models which considered selective heterogeneous ensemble developed by Xia et al., [16], which both the base models and the ensemble framework processed some hyper-parameters. In terms of the hyper-parameter in the ensemble framework, they set the penalized parameter d in Equation (4) as 3 and the bench marks and base models has many hyper-parameters, which can be seen in detail in Table 4 of Reference [16]. The MIFCA model does not consider the conflict difference between multiple sources of information, and integrates 6 different types of classifiers based on D-S evidence theory to reduce the uncertainty of the model. The base classifiers of the MIFCA model adopts default parameters [39]. The comparison results presented in Table 6 and Fig. 4.

As shown in Table 6 and Fig. 4, the results of the five classification algorithms on the real credit dataset for each classification performance assessment criterion show that the constructed BIF-MCCA has the highest classification performance. The performance of CFHM is similar to that of OCHE, but compared with the time cost of parameter tuning in CFHM and OCHE models, the BIF-MCCA models constructed in this paper are all under the default parameters, which reduces the time by not needing to perform parameter tuning, so the BIF-MCCA model is more concise and lightweight.

Table 6
Comparison of the performance by other credit classification methods

Models Accuracy Precision Recall F1-score

Advanced Methods CatBoost 0.8090 0.8085 0.8114 0.8079

CFHM 0.8185 0.8086 0.8114 0.8099

OCHE 0.8234 0.8230 0.8270 0.8242

MIFCA 0.8348 0.8434 0.8286 0.8358

The Proposed BIF-MCCA 0.8556 0.8658 0.8632 0.8644

Models		Accuracy	Precision	Recall	F1-score
Advanced Methods	CatBoost	0.8090	0.8085	0.8114	0.8079
	CFHM	0.8185	0.8086	0.8114	0.8099
	OCHE	0.8234	0.8230	0.8270	0.8242
	MIFCA	0.8348	0.8434	0.8286	0.8358
The Proposed	BIF-MCCA	0.8556	0.8658	0.8632	0.8644

Fig. 4

Classification performance results of LR, CatBoost, CFHM, OCHE and BIF-MCCA in four assessment criteria.

Meanwhile, BIF-MCCA considers the conflict differences between multi-source information, which further optimizes the limitation that MIFCA is only applicable to the situation where the differences between multi-source information are not obvious, and the generalization performance is stronger. Through the above experimental analysis, the excellent performance of the BIF-MCCA integrated assessment model constructed in this paper is verified.

5.3 Statistical test results

The superiority of the BIF-MCCA model constructed in this paper has been verified in 5.1 and 5.2, and to further illustrate the reliability of the experiments, we use the Friedman test advocated by Demsar to test the significance of the various methods [44]. The Friedman test is a typical non-parametric statistical test. In the Friedman test, 11 classification models were ranked, based on different evaluation indicators. These models included six base classifiers, four ensemble models, and the proposed model [45].

Table 7 summarizes the significance test results of the average method ranking using the Friedman test. The results show that the statistical values calculated using the Friedman test is higher than the Chi-square critical value, and the p-values is lower than the alpha value (0.05), the null hypothesis is rejected and the robustness of the model is verified.

Table 7
Classifier ranking results of Friedman test

Classifier LR DT KNN SVM RF XGboost CatBoost CFHM OCHE MIFCA BIF-MCCA

Accuracy 11 10 9 8 7 6 5 4 3 2 1

Precision 11 10 9 8 7 6 5 4 3 2 1

Recall 11 9 10 8 7 6 4.5 4.5 3 2 1

F1-score 11 10 9 8 6 7 5 4 3 2 1

AvgRank 11 9.75 9.25 8 6.75 6.25 4.875 4.125 3 2 1

Classifier	LR	DT	KNN	SVM	RF	XGboost	CatBoost	CFHM	OCHE	MIFCA	BIF-MCCA
Accuracy	11	10	9	8	7	6	5	4	3	2	1
Precision	11	10	9	8	7	6	5	4	3	2	1
Recall	11	9	10	8	7	6	4.5	4.5	3	2	1
F1-score	11	10	9	8	6	7	5	4	3	2	1
AvgRank	11	9.75	9.25	8	6.75	6.25	4.875	4.125	3	2	1

Statistics of the Friedman test 39.639. p-value < 0.001. Note: the alpha value is 0.05; the Chi-square critical value is 7.81.

6 Conclusion

Improving the classification performance of customer credit assessment and accurately judging potentially valuable and defaulting customers are crucial for commercial banks and other financial institutions to avoid risks. A multi-class classification ensemble assessment model for personal credit rating of commercial banks based on multi-source information fusion was proposed, and the effectiveness of the proposed model is verified on a real dataset. The results show that BIF-MCCA has the following advantages:

The proposed BIF-MCCA model integrates five different types of benchmark classifiers, which has better classification accuracy and stronger generalization ability compared with the single classifier that is more sensitive to data.

Considering the conflicts and discrepancies between multiple sources of information, a new method for correcting the category prior information using difference measures is proposed, which makes full use of the prior information and thus improves the classification performance of the model.

The benchmark classifiers adopted in this study use default parameters, which reduces the time-consuming cost of parameter tuning and thus makes the model operation more lightweight. And improving the accuracy, efficiency and generalization universality of the multi-class classification of commercial banks’ customer credit rating.

Footnotes

Acknowledgments

This work was partially supported by the Major Project of the National Social Science Fund of China (18ZDA104).

References

Jin

, Liu

, Zhang

and Lou

, A novel multi-stage ensemble model with multiple k-means-based selective undersampling: an application in credit scoring, Journal of Intelligent & Fuzzy System 40(5) (2021), 9471–9484.

Yang

D.Q.

, Zhang

W.Y.

, Wu

, Ablanedo-Rosas

J.H.

, Yang

L.X.

and Yu

W.Z.

, A novel multi-stage ensemble model with fuzzy clustering and optimized classifier composition for corporate bankruptcy prediction, Journal of Intelligent & Fuzzy Systems 43(3) (2021), 4169–4185.

Fernandes

G.B.

and Rinaldo

, Spatial dependence in credit risk and its improvement in credit scoring, European Journal of Operational Research 249(2) (2016), 517–524.

Dumitrescu

E.I.

, Hué

, Hurlin

and Tokpavi

, Machine learning for credit scoring: improving logistic regression with non-linear decision-tree effects, European Journal of Operational Research 297 (2022), 1178–1192.

Silva

, Pereira

and Magalhaes

, A class of categorization methods for credit scoring models, European Journal of Operational Research 296(1) (2022), 323–331.

Caigny

A.D.

, Coussement

and Bock

K.W.D.

, A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees, European Journal of Operational Research 269(2) (2018), 760–772.

Dastile

, Celik

and Potsane

, Statistical and machine learning models in credit scoring: A systematic literature survey, Applied Soft Computing 91 (2020), 106263.

AKKO

, An empirical comparison of conventional techniques, neural networks and the three-stage hybrid adaptive neuro fuzzy inference system (anfis) model for credit scoring analysis: the case of Turkish credit card data, European Journal of Operational Research 222(1) (2012), 168–178.

Harris

, Credit scoring using the clustered support vector machine, Expert Systems with Applications 42(2) (2015), 741–750.

10.

Zhang

, He

and Zhang

, A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring, Expert Systems with Applications 121 (2019), 221–232.

11.

Zhang

W.Y.

, Yang

D.Q.

, Zhang

, Ablanedo-Rosas

J.H.

, Wu

and Lou

, A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring, Expert Systems with Applications 165 (2021), 113872.

12.

Singh

, Kumar

, Srinivasa

K.G.

, Maini

, Ahuja

and Jain

, A multi-level classification and modified pso clustering based ensemble approach for credit scoring, Applied Soft Computing 111 (2021), 107687.

13.

Tripathi

, Edla

D.R.

and Cheruku

, Hybrid credit scoring model using neighborhood rough set andmulti-layer ensemble classification, Journal of Intelligent and Fuzzy Systems 34(3) (2018), 1543–1549.

14.

Zhu

, Xie

and Wang

G.J.

, Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance, Neural Computing & Applications 28(1) (2017), 41–50.

15.

Chen

and Wang

L.X.

, Research on the Adaptive Multi-classification of Commercial Credit in the Manufacturing Enterprises, Industrial Engineering and Management 23(5) (2018), 162–168.

16.

Xia

, Zhao

, He

, Li

and Niu

, A novel tree-based dynamic heterogeneous ensemble method for credit scoring, Expert Systems with Applications 159 (2020), 113615.

17.

Kulkarni

S.V.

and Dhage

S.N.

, Advanced credit score calculation using social media and machine learning, Journal of Intelligent and Fuzzy Systems 36(3) (2019), 1–8.

18.

Papouskova

and Hajek

, Two-stage consumer credit risk modelling using heterogeneous ensemble learning, Decision Support Systems 118 (2019), 33–45.

19.

Zhang

, Yang

and Zhang

, A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring, Expert Systems with Applications 174 (2021), 114744.

20.

Teles

, Rodrigues

, Saleem

, Kozlov

and Rabêlo

, Machine learning and decision support system on credit scoring, Neural Computing and Applications 32(14) (2020), 9809–9826.

21.

Pawiak

, Abdar

and Acharya

U.R.

, Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring, Applied Soft Computing 84 (2019), 105740.

22.

Abdelmoula

A.K.

, Bank credit risk analysis with k-nearest-neighbor classifier: case of Tunisian banks, Journal of Accounting & Management Information Systems 14(1) (2015), 79–106.

23.

Breiman

, Random forests, Machine Learning 45 (2001), 5–32.

24.

Chen

, Guestrin

Xgboost:AScalableTree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, San Francisco, California, USA, (2016), 785794.

25.

Maillo

, Ramírez

, Triguero

and Herrera

, kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data, Knowledge-Based Systems 117 (2016), 3–15.

26.

Envelope

, Envelope

and Ramos

, Profit scoring for credit unions using the multilayer perceptron, xgboost and tabnet algorithms: Evidence from Peru, Expert Systems with Applications 213 (2023), 119201.

27.

Shafer

A mathematical theory of evidence, Princeton University Press, Princeton, (1976).

28.

Lin

and Yixiao

, An approach of classifiers fusion based on hierarchical modifications, Applied Intelligence 52(6) (2021), 6464–6476. CFHM.

29.

Xia

, Liu

and Li

Y.Y.

, A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring, Expert Systems with Applications 78 (2017), 225–241.

30.

Gunnarsson

B.R.

, Broucke

S.V.

, Baesens

, Skarsdóttir

and Lemahieu

, Deep learning for credit scoring: do or don’t? European Journal of Operational Research 295(2) (2021), 292–305.

31.

Blumenstock

, Don’t forget people in the use of big data for development, Nature 561(7722) (2018), 170–172.

32.

Onay

and Ztürk

, A review of credit scoring research in the age of Big Data, Journal of Financial Regulation and Compliance 26(3) (2018), 382–405.

33.

, Giw

, Li

and Xin

M.C.

, A novel selective nave bayes algorithm, Knowledge-Based Systems 192 (2020), 105361.

34.

Xia

, Liu

, Li

and Liu

, A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring, Expert Systems with Applications 78 (2017), 225–241.

35.

Liu

C.L.

and Hsieh

P.Y.

, Model-based synthetic sampling for imbalanced data, IEEE Transactions on Knowledge and Data Engineering 32(8) (2020), 1543–1556.

36.

Brown

and Mues

, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications 39(3) (2012), 3446–3453.

37.

Shen

, Zhao

, Kou

and Alsaadi

F.E.

, A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique, Applied Soft Computing 98 (2021), 106852.

38.

Nalic

, Martinovic

and Zagar

, New hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers, Advanced Engineering Informatics 45 (2020), 101130.

39.

Wang

T.H.

, Liu

R.J.

and Qi

G.H.

, Multi-classification assessment of bank personal credit risk based on multi-source information fusion, Expert Systems with Applications 191 (2022), 116236.

40.

Cheng-Hsiung

and Cheng-Kui

, A Hybrid Machine Learning Model for Credit Approval, Applied Artificial Intelligence 35(15) (2021), 1439–1465.

41.

Kozodo

, Lessmann

, Papakonstantinou

, Gatsoulis

and Baesens

, A multi-objective approach for profit-driven feature selection in credit scoring, Decision Support Systems 120 (2019), 106–117.

42.

Yao

, Wang

, Zhang

and Yan

, A hybrid model with novel feature selection method and enhanced voting method for credit scoring, Journal of Intelligent and Fuzzy Systems 42 (2022), 2565–2579.

43.

Prokhorenkova

, Gusev

, Vorobev

, Dorogush

A.V.

, Gulin

CatBoost: unbiased boosting with categorical features, In Advances in Neural Information Processing Systems (2018), 6638–6648.

44.

Demsar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research (2006), 1–30.

45.

Friedman

, A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings, Annals of Mathematical Statistics 11 (1940), 86–92.

A novel ensemble model of multi-class credit assessment based on multi-source fusion theory

Abstract

Keywords

1 Introduction

2 Related work

2.1 Classification model

2.2 Information fusion theory

3.1 Bayesian data fusion theory

4.1 Credit dataset description

Table 1 Details of the credit dataset Dataset Total features Total categories Five categories samples distribution Sample size C1 samples C2 samples C3 samples C4 samples C5 samples Credit 23 5 25516 1445 109 103 349 27522

Table 3 Confusion matrix Actually Positive Actually Negative Predicted Position True Positive False Positive Predicted Negative False Negative True Negative

5 Empirical analysis and results

5.1 Baseline classifier results

5.1.1 Assessment performance of the model at each classification level

Table 5 The overall accuracy of the personal credit assessment model Model Type LR DT KNN SVM RF XGboost BIF-MCCA Accuracy 63.21% 72.95% 74.15% 76.15% 78.6 % 80.25% 85.56% Improvement Rate 35.36% 17.29% 15.39% 12.36% 8.85% 6.62% –

Footnotes

Acknowledgments

References

Table 1
Details of the credit dataset

Dataset Total features Total categories Five categories samples distribution Sample size

C1 samples C2 samples C3 samples C4 samples C5 samples

Credit 23 5 25516 1445 109 103 349 27522

Table 3
Confusion matrix

Actually Positive Actually Negative

Predicted Position True Positive False Positive

Predicted Negative False Negative True Negative

Table 5
The overall accuracy of the personal credit assessment model

Model Type LR DT KNN SVM RF XGboost BIF-MCCA

Accuracy 63.21% 72.95% 74.15% 76.15% 78.6 % 80.25% 85.56%

Improvement Rate 35.36% 17.29% 15.39% 12.36% 8.85% 6.62% –