Sage Journals: Discover world-class research

Abstract

In this study, we propose a new classification method by adopting some ideas originating from the fuzzy comprehensive evaluation (FCE). To make the FCE be a classifier, the class labels in classification problems are regarded as the evaluation remarks in the FCE, and the attributes in these two domains are regarded to be consistent. Then, to implement the FCE model B = W ∘ R and obtain an accurate classification result, on the one hand, a learning algorithm, which is based on the joint distribution of attribute values and is dynamic, is proposed to construct the fuzzy relational matrix R; on the other hand, equal weight is considered to constitute the weight vector W. Meanwhile, for a continuous dataset, the discretization method and the determination of the discretization class number corresponding to the proposed classifier are discussed. The proposed classifier not only innovatively extends the FCE to data mining but also has its own classification advantages, that is, it is easy to operate and has good interpretability. Finally, we perform some numerical experiments using publicly available datasets, and the experimental results demonstrate that the proposed classifier outperforms some existing classifiers.

Keywords

Fuzzy set fuzzy comprehensive evaluation classification data mining

1 Introduction

Fuzzy set theory was proposed by Zadeh in 1965 [1]. It was designed to supplement the interpretation of linguistic or measure uncertainties for real-world random phenomena. On the basis of fuzzy mathematics, Wang put forward the fuzzy comprehensive evaluation (FCE) in 1980 [2], which is a method of comprehensive evaluation of the subordinate status of objects from multiple attributes by applying the principle of fuzzy relation synthesis. The FCE has been applied to many kinds of scientific fields [3 –12], such as customer satisfaction, teaching performance, water quality, and working fatigue state.

The FCE is denoted as FCE =〈U, V, R, W〉, where U represents a factor set, V represents a decision set, R is a fuzzy relational matrix consisting of membership degrees, and W is a weight vector corresponding to the factor set U. After the fuzzy synthesis by the FCE model B = W ∘ R, the evaluation result is obtained. The research on the FCE mainly focuses on the following aspects. a) In terms of the FCE model, Wei et al. [14] introduced a trustworthy degree to the fuzzy relational matrix to address the total judgment numbers of each attribute are different. Reference [15] came up with a dynamic FCE to achieve a real-time risk assessment. To improve the two-layer FCE, Xu et al. [16] raised the multi-source FCE, which does not limit whether factor sets have intersections or not. In addition, references [17 –20] studied nonlinear FCE models, the FCE model with prominent impact factors, and the HFLTS-DEMATEL FCE model. b) In determining membership degrees, Yang et al. [21] put forward a combined membership function based on the variance-covariance optimized combination method to avoid subjectivity in choosing the membership function. c) In determining weights, there have also been some improvement methods. In [22], the variation coefficient method was used to calculate weights, so as to reduce the workload and avoid adverse effects from abnormal values and weight distribution equalization. Rezaei [23] proposed a best-worst method which requires less comparison and more easily passes the consistency check than the analytic hierarchy process (AHP). Chiao [24] extended the AHP to type 2 fuzzy sets, that is, the pairwise comparison of decision linguistic judgments was characterized by type 2 fuzzy sets. Reference [25] used a formula method to determine weights. d) In terms of the evaluation criteria, Wang et al. [26] presented a method to address the inefficiency problem of the principle of maximum membership.

With the advent of the data deluge era, data mining has received increasing attention. In practice, we noted that the FCE is similar to the classification technique, which is a very important data mining tool and is one of the focuses and hotspots of data mining research. Specifically, they all deal with objects with multiple attributes and output a choice from a list of alternatives (i.e., evaluation remarks or class labels). Thus, the FCE model is actually a classifier. Its classification principle is to provide the most probable class label to the object by comprehensively considering the membership degrees of every attribute to class labels. Therefore, this classifier has good interpretability. According to the aforementioned related works, we found that using the FCE to solve classification problems has not been studied, and this new research direction will extensively extend the application of the FCE, such that the FCE innovatively advances in data mining.

Let us give an example to illustrate. Table 1 shows some cases from the Iris dataset, which is one of the best known classification datasets. We will see it from the perspective of the FCE.

Table 1
Some cases from the Iris dataset

TID Sepal Length Sepal Width Petal Length Petal Width Class

1 5.1 3.5 1.4 0.2 Iris-setosa

2 5.3 3.7 1.5 0.2 Iris-setosa

3 7.0 3.2 4.7 1.4 Iris-versicolour

4 4.9 2.4 3.3 1.0 Iris-versicolour

5 5.9 3.0 5.1 1.8 Iris-virginica

TID	Sepal Length	Sepal Width	Petal Length	Petal Width	Class
1	5.1	3.5	1.4	0.2	Iris-setosa
2	5.3	3.7	1.5	0.2	Iris-setosa
3	7.0	3.2	4.7	1.4	Iris-versicolour
4	4.9	2.4	3.3	1.0	Iris-versicolour
5	5.9	3.0	5.1	1.8	Iris-virginica

*TID: transaction id.

In Table 1, the FCE will consider four attributes, Sepal Length, Sepal Width, Petal Length, and Petal Width, and output a choice from {Iris-setosa, Iris-versicolour, Iris-virginica}. Next, we need to implement the FCE model B = W ∘ R to obtain the output. However, the fuzzy relational matrix R and the weight vector W are unknown to us now. Therefore, the key to using the FCE to solve classification problems lies in learning a reasonable R and a suitable W from the classification sample data, and this is exactly the research objective of this paper.

In this study, the joint distribution dynamic membership degree is proposed to construct the fuzzy relational matrix R, and equal weight is considered to constitute the weight vector W. Hence, our proposed new classification method is named the fuzzy comprehensive evaluation based on the joint distribution dynamic membership degree (FCE-JDDMD). To test the effectiveness of the construction of R and W, this study performs several numerical experiments and the experimental results demonstrate that the proposed FCE-JDDMD classifier outperforms some existing classifiers.

In addition, since the FCE model is very simple and the construction methods of R and W proposed in this paper are effortless (in other words, there are no complicated algorithms in the proposed classifier), the proposed FCE-JDDMD classifier is easy to operate. Currently, there is copious literature on proposing a new, more accurate classifier, but proposing a simple and more accurate classifier is still particularly precious.

The main contributions of this study are as follows:

This study proposes a new, simple and outperforming classification method named the FCE-JDDMD.

This study puts forward a learning algorithm for the membership degree, which is based on the joint distribution of attribute values and is dynamic.

This study extends the application of the FCE, such that the FCE advances in data mining.

The remainder of this study is organized as follows. In Section 2, the preliminary concepts in the FCE are described. In Section 3, the new classification method is introduced, and the learning algorithm for the membership degree is formally discussed. The experimental results are presented in Section 4, and conclusions are drawn in Section 5.

2 Preliminaries

The FCE consists of four components: a factor set, a decision set, a fuzzy relational matrix, and a weight vector [13]. The specific explanation of each component is as follows:

Factor set

The factor set is composed of all influence factors or evaluation indices, and it is written as U = {u₁, u₂, ⋯ , u_n} in which n is the number of influence factors.

Decision set

The decision set consists of all evaluation remarks or evaluation grades, and it is written as V = {v₁, v₂, ⋯ , v_m} in which m is the number of evaluation remarks.

Fuzzy relational matrix

We denote the membership degree of factor u_i to evaluation remark v_j as r_ij (i = 1,2,⋯,n; j = 1,2,⋯,m), then the fuzzy relational matrix is expressed as

$\begin{matrix} R = [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 m} \\ r_{21} & r_{22} & \dots & r_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ r_{n 1} & r_{n 2} & \dots & r_{nm} \end{matrix}] . \end{matrix}$

There are several methods to determine the membership degree [13, 27], and the fuzzy statistic method is one of them. Because this method is a basic theory used in this study, we provide a detail introduction as follows.

The fuzzy statistic method was proposed by Zhang in 1981 [13, 28]. In [13, 28], age is the universe of discourse (denoted as X). To make clear the mbership degree of x₀ ∈ X to young (which is a fuzzy set), Zhang designed a questionnaire and invited every respondent to answer an age interval of young in their own opinions. Let G be the total number of collected age intervals and g be the number of age intervals that cover x₀; then, g / G is called e subordinate frequency of x₀ to young. Much pice points out the subordinate frequency tends to remain stable as G increases, and this stable subordinatequency ihe membership degree of x₀ to young. The fuzzy statistic method can be briefly summarized as follows.

Let X be the universe of discourse, x₀ ∈ X, and F be a fuzzy set. A total of G experiments are carried out, and in each experiment, one needs to make a decision on whether x₀ belongs to the fuzzy set F or not. If there are g experimental results revealing x₀ ∈ F, then the membership degree of x₀ to the fuzzy set F is $\frac{g}{G}$ .

Weight vector

To the evaluation result, the influence degree of each factor may differ. Some factors may have smaller influence, while others may have greater influence. Let w_i be the weight of factor u_i; then, W = (w₁, w₂, ⋯ , w_n) is called the weight vector, in which $\sum_{i = 1}^{n} w_{i} = 1$ . The weight vector can be determined by the Delphi method, AHP, FAHP, etc., and we do not review these approaches here for the sake of brevity.

Based on the above four components, the FCE can be denoted as FCE =〈U, V, R, W〉. After establishing these components, the FCE model B = W ∘ R = (b₁, b₂, ⋯ , b_m) can be implemented, where B is called the evaluation vector, ∘ is the fuzzy composition operator, and b_j is the membership degree of the evaluated object to evaluation remark v_j (j = 1,2,⋯,m). There are several fuzzy composition operators [13, 27] and we use the weighted average operator in this paper (that is $b_{j} = \sum_{i = 1}^{n} w_{i} r_{ij}$ , j = 1,2,⋯,m). Furthermore, we can use the principle of maximum membership to output the evaluation result, which means if b_k = max {b₁, b₂, ⋯ , b_m} then v_k is the evaluation result. The FCE model is shown in Fig. 1, in which F(*) is the set of all fuzzy subsets on *.

Fig. 1

The FCE model.

To explain the FCE clearly, a simple example is shown here. For a given iris plant sample, after measuring, we have its Sepal Length = 5.2, Sepal Width = 3.7, Petal Length = 1.8, and Petal Width = 0.5. We know that there are three types of iris plants, Iris-setosa, Iris-versicolour, and Iris-virginica, and now, we need to decide which type this given sample belongs to. In this example, four influence factors constitute the factor set U = {Sepal Length, Sepal Width, Petal Length, Petal Width}, and three evaluation remarks constitute the decision set V = {Iris-setosa, Iris-versicolour, Iris-virginica}. We assume its fuzzy relational matrix and weight vector are respectively calculated to be (for the calculation methods, refer to the above presentation) $\begin{matrix} R = [\begin{matrix} 0.8 & 0.1 & 0.1 \\ 0.4 & 0.1 & 0.5 \\ 0.7 & 0.2 & 0.1 \\ 0.5 & 0.3 & 0.2 \end{matrix}] \end{matrix}$ and W=(0.2,0.3,0.3,0.2).

Then, according to the FCE model B = W ∘ R and the weighted average operator, we obtain $\begin{matrix} B = W \circ R = (0.2, 0.3, 0.3, 0.2) \\ \circ [\begin{matrix} 0.8 & 0.1 & 0.1 \\ 0.4 & 0.1 & 0.5 \\ 0.7 & 0.2 & 0.1 \\ 0.5 & 0.3 & 0.2 \end{matrix}] = (0.59, 0.17, 0.24) \end{matrix}$

Due to 0.59 > 0.24 > 0.17, which implies the membership degree to Iris-setosa is the largest one, the evaluation result is Iris-setosa by the principle of maximum membership. That is, the given iris plant sample belongs to Iris-setosa.

Note 1. In this study, for the particular situation of the principle of maximum membership b_k′ = b_k″ = max {b₁, b₂, ⋯ , b_m} (k′, k″ ∈ { 1, 2, ⋯ , m } ; k′ ≠ k″), we define: if k′ < k″, then v_k′ is the evaluation result.

3 Proposed method: the fuzzy comprehensive evaluation based on the joint distribution dynamic membership degree (FCE-JDDMD)

Before introducing the proposed method, we give a basic presentation on classification:

The classification method [29] is a common data mining technique, and many classification methods have been proposed, such as support vector machine, decision tree, artificial neural network, and Naive Bayes rule. Classification has been extensively applied to image classification, document classification, sentiment analysis, spam mail filtering, disease prediction and so on [30]. Let D be a classification dataset with n distinct attributes a₁, a₂, ⋯ , a_n. (we denote A = {a₁, a₂, ⋯ , a_n}), |D| be its number of cases or transactions, and C ={ c₁, c₂, ⋯ , c_m } be its list of class labels. Then, the i-th case in D can be described as a combination of attribute values d_ij corresponding to the attribute a_j and a class label c_k (i = 1,2,⋯,|D|; j = 1,2,⋯,n; k = 1,2,⋯,m). In the classification process, D is first partitioned into two subsets, a training set (denoted as D_train) and a test set (denoted as D_test). The goal of classification is to build a classification model by D_train, which can accurately predict a class lal from C for any case in D_test. In addition, the model is evaluated by classification accuracy τ/|D_test|. where τ is the number ocases that their predicted class labels are exactly their actual class labels, and |D_test| is the number ocases in D_test.

Now, let us compare the classification dataset D with the four components of the FCE. In other words, we wilsee the classification problem from the FCE viewpoint. We notice that the list of attributes A can be regarded as the factor set U, the list of class labels C can be garded as the decision set V, and the predicted class label can be regarded as the evaluation result in the FCE. Thus, the FCE model is actually a classifier, and thenly thing we need to do is to generate a reasonablfzy relational mrix R and a suitable weight vector W by D_train, such that the FCE model B = W ∘ R can be carried out.

In this study, R is constructed by the joint distribution dynamic membership degree and W is constituted by equal weight, both of which will be introduced in Section 3.1 and Section 3.2, respectively. Therefore, our proposed classifier is called the fuzzy comprehensive evaluation based on the joint distribution dynamic membership degree (FCE-JDDMD).

The FCE-JDDMD classifier consists of six phases: input a classification dataset, data preprocessing (which refers to discretizing continuous attribute values), construct the fuzzy relational matrix R, input the weight vector W, implement the FCE model, and output a class label. The overall architecture of the FCE-JDDMD classifier is shown in Fig. 2.

Fig. 2

Overall architecture of the FCE-JDDMD classifier.

3.1 The construction of R

In this study, we divide datasets into three categories according to their attribute characteristics: discrete, continuous, and combined. An attribute is called a discrete attribute if its attribute values are nominal or categorical, and it is called a continuous attribute if its attribute values are real. If a dataset consists of discrete attributes, then it is called a discrete dataset, and if it is composed of continuous attributes, then it is called a continuous dataset. A combined dataset refers to a dataset that has both discrete attributes and continuous attributes.

In this section, for a discrete dataset, we propose the joint distribution dynamic membership degree (JDDMD) to construct R; for a continuous dataset or a combined dataset, we employ the equidistant discretization to convert it to a discrete dataset.

3.1.1 The construction of R in a discrete dataset

In this section, we will present the JDDMD in detail. The JDDMD is an improvement method of the static membership degree (SMD) which is also proposed by this study, so we will present the SMD first.

(1) SMD

Let D be a classification dataset with n distinct attributes a₁, a₂, ⋯ , a_n, |D| be its number of cases, and C ={ c₁, c₂, ⋯ , c_m } be its list of class labels. In a discrete dataset, for a given attribute a_{j
₀} (j₀∈{1,2,⋯,n}), its distinct attribute values are limited and we denote them as e₁, e₂, ⋯ , and e_l. Furthermore, for a given h₀∈{1,2,⋯,l}, we assume the number of e_{h
₀} is s in D. Apparently, there is a list of class labels corresponding to e_{h
₀}, and we denote it as C′. Meanwhile, we let c_{k
₀} ∈ C′ (k₀∈{1,2,⋯,m}) and assume the number of c_{k
₀} is t in these s cases. Now, it can be seen as a scene like this: there are s persons to decide a class label from C for the attribute value e_{h
₀} and t persons choose the class c_{k
₀}, which exactly conforms to the idea of the fuzzy statistic method. Thus, t / s is the membership degree of attribute value e_{h
₀} to class c_{k
₀}.

In particular, when s = 0, it implies there is no person to decide a class label for e_{h
₀}; and in this study, we define: if s = 0, then the membership degree is 0.

The mathematical description of the SMD is as follows.

In the dataset D, for the i-th case, we let d_ij be its attribute value corresponding to the attribute a_j and c (i) be its class label. Then, for a given case $d^{'} = ({d^{'}}_{1}, {d^{'}}_{2}, ..., {d^{'}}_{n})$ where ${d^{'}}_{1}$ is the attribute value corresponding to the attribute a_j, its SMD under the attribute a_j to class c_k can be described as

$r_{jk} = {\begin{matrix} \frac{\sum_{i = 1}^{| D |} t (d_{ij} = d_{j}^{'}, c (i) = c_{k})}{\sum_{i = 1}^{| D |} s (d_{ij} = d_{j}^{'})} (\sum_{i = 1}^{| D |} s (d_{ij} = d_{j}^{'}) \neq 0) \\ 0 (\sum_{i = 1}^{| D |} s (d_{ij} = d_{j}^{'}) = 0) \end{matrix}$ (1)

where j = 1,2,⋯,n; k = 1,2,⋯,m; $s (d_{ij} = d_{j}^{'})$ stands for an indictor variable, which equals to 1 if $d_{ij} = d_{j}^{'}$ and to 0 otherwise; and $t (d_{ij} = d_{j}^{'}, c (i) = c_{k})$ also stands for an indictor variable, which equals to 1 if $d_{ij} = d_{j}^{'}$ and c (i) = c_k, and to 0 otherwise.

To clearly explain the construction of R using the SMD, a simple example is shown here.

Example 1. Table 2 shows a discrete classification dataset with eighteen cases, four attributes, and two class labels. Csidering tase d′=(3,2,3,3), we construct its fuzzy relational matrix R.

Table 2

A discrete classification dataset

TID	a ₁	a ₂	a ₃	a ₄	Class
1	1	1	1	2	2
2	1	1	1	3	2
3	1	2	1	2	1
4	1	2	1	3	2
5	1	3	3	1	1
6	1	3	3	2	2
7	2	1	1	3	2
8	2	1	2	1	1
9	2	1	2	2	2
10	2	2	2	2	1
11	2	2	3	2	2
12	2	2	3	3	2
13	2	3	3	2	1
14	3	1	1	3	1
15	3	1	2	2	2
16	3	1	2	3	2
17	3	2	2	3	1
18	3	2	3	2	1

*TID: transaction id.

We consider the given case d′=(3,2,3,3). a) For the attribute a₁, there are five cases whose attribute values are 3: TID = 14, TID = 15, TID = 16, TID = 17, and TID = 18 (i.e., s = 5). In these five cases, the numbers of class 1 and class 2 are 3 and 2, respectively (i.e., t = 3 and t = 2, respectively). Therefore, the membership degrees to class 1 and class 2 are $\frac{3}{5}$ . and $\frac{2}{5}$ , respectively. b) For the attribute a₂, s = 7. To class 1 and class 2, t = 4 and t = 3, respectively. Therefore, the membership degrees to class 1 and class 2 are $\frac{4}{7}$ . and $\frac{3}{7}$ , respectively. c) For the attribute a₃, they are $\frac{3}{6}$ and $\frac{3}{6}$ . d) For the attribute a₄, they are $\frac{2}{7}$ and $\frac{5}{7}$ . Thus, its fuzzy relational matrix is $\begin{matrix} R = [\begin{matrix} \frac{3}{5} & \frac{2}{5} \\ \frac{4}{7} & \frac{3}{7} \\ \frac{3}{6} & \frac{3}{6} \\ \frac{2}{7} & \frac{5}{7} \end{matrix}] . \end{matrix}$

(2) JDDMD

In practice, we noticed that the SMD might lead to an unreasonable evaluation vector. The following is a simple example.

Assume we have a dataset such as that shown in Table 3. Now, let us construct the fuzzy relational matrix R for the given case d′= (1,1).

Table 3

A discrete classification dataset

TID	a ₁	a ₂	Class
1	1	1	1
2	0	0	1
3	0	1	2
4	1	0	2

*TID: transaction id.

According to Equation (1), we obtain

$\begin{matrix} R = [\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}] . \end{matrix}$

We assume that W = (w₁, w₂) is the corresponding weight vector, where w₁ + w₂ = 1. Then, regardless of the value of W, when we use the weighted average operator (i.e., $b_{j} = \sum_{i = 1}^{n} w_{i} r_{ij}$ , j=1,2, ⋯,m), we always have $\begin{matrix} B = W \circ R = (w_{1}, w_{2}) \circ [\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}] = (0.5, 0.5) . \end{matrix}$

This evaluation vector implies that the membership degree of the given case d′ to class 1 is 0.5 and to class 2 is also 0.5. Because the given case is the same as the first case in Table 3, we believe that they should have the same class label; that is, the given case should definitely belong to class 1, which means the evaluation vector should be (1,0). Thus, the above evaluation vector B = (0.5,0.5) is suspicious. Since B = (0.5,0.5) is unrelated to W, the problem must lie in the SMD. Hence, we come up with the dynamic membership degree, which is based on the joint distribution of attribute values. Its mathematical description is as follows.

Let D be a classification dataset with n distinct attributes a₁, a₂, ⋯ , a_n, |D| be its number of cases, C = {c₁, c₂, …, c_m} be its list of class labels, and d_ij be the attribute value of the i-th case corresponding to the attribute a_j (i = 1,2,⋯,|D|.; j = 1,2,⋯,n). For a given case $d^{'} = ({d^{'}}_{1}, {d^{'}}_{2}, ..., {d^{'}}_{n})$ , where ${d^{'}}_{1}$ is the attribute value corresponding to the attribute a_j, we denote

$θ_{i} = \sum_{j = 1}^{n} s (d_{ij} = d_{j}^{'})$ (2)

$θ = \max {θ_{1}, θ_{2}, \dots, θ_{| D |}}$ (3)

$T = {i | θ_{i} = θ, i = 1, 2, \dots, | D |}$ (4) where i = 1,⋯,|D|; $s (d_{ij} = d_{j}^{'})$ is an aforementioned symbol, which equals to 1 if $d_{ij} = d_{j}^{'}$ and to 0 otherwise. θi is called the joint distribution correlated number (JDCN), and θ is called the highest JDCN. Furthermore, we choose the cases whose TID∈T from D to build a new dataset D′, and D′ is called the highest correlated dataset. Then, according to D′, we calculate the membership degree by Equation (1).

Now, considering Table 3, let us construct R for the given case d′= (1,1) using the JDDMD. First, the JDCNs of the given case to each case in Table 3 are calculated by Equation (2), and they are shown in Table 4.

Table 4

Table 3 with the JDCN

TID	a ₁	a ₂	Class	JDCN
1	1	1	1	2
2	0	0	1	0
3	0	1	2	1
4	1	0	2	1

* TID: transaction id.

So, the highest JDCN θ=max{2,0,1} = 2. Then, we have T = {1}. Hence, the highest correlated dataset D′ is (seen in Table 5)

Table 5

The highest correlated dataset

TID	a ₁	a ₂	Class
1	1	1	1

* TID: transaction id.

Finally, according to D′, d′= (1,1) and Equation (1), we get the fuzzy relational matrix $\begin{matrix} R = [\begin{matrix} 1 & 0 \\ 1 & 0 \end{matrix}] . \end{matrix}$

Furthermore, let us check the evaluation vector after using the JDDMD: $\begin{matrix} B = W \circ R = (w_{1}, w_{2}) \circ [\begin{matrix} 1 & 0 \\ 1 & 0 \end{matrix}] = (1, 0) \end{matrix}$ where W = (w₁, w₂) is the weight vector with w₁ + w₂ = 1, and ∘ is the weighted average operator (i.e., $b_{j} = \sum_{i = 1}^{n} w_{i} r_{ij}$ , j = 1,2,⋯,m). This evaluation vector implies that the membership degree of the given case d′= (1,1) to class 1 is 1 and to class 2 is 0, and it is desired and reasonable. Thus, the JDDMD is superior to the SMD.

Next, let us discuss the difference between the SMD and the JDDMD. It is not difficult to find that the former is calculated based on the original dataset D and the latter is calculated based on the highest correlated dataset D′ which is a subset of D. Let $d^{'} = ({d^{'}}_{1}, {d^{'}}_{2}, ..., {d^{'}}_{n})$ and $d^{″} = ({d^{″}}_{1}, {d^{″}}_{2}, ..., {d^{″}}_{n})$ be two distinct given cases. Then, in the calculation of SMD, if there exists a j₀∈ {1,2,⋯,n} such that $d_{j_{0}}^{'} = d_{j_{0}}^{″}$ , then $r_{j_{0} k}^{'} = r_{j_{0} k}^{″}$ where k = 1,2,⋯,m, $r_{j_{0} k}^{'}$ is the membership degree of attribute value $d_{j_{0}}^{'}$ to class c_k, and $r_{j_{0} k}^{″}$ is e membership degree of attribute value $d j_{0} ″$ to class c_k. That is the membership degree is static or invariable when calculating the SMD. However, in the calculation of JDDMD, $r_{j_{o} k}^{'} \neq r_{j_{o} k}^{''}$ generally holds. That is because $r_{j_{0} k}^{'}$ is calculated based on D′ which is the highest correlated dataset of d′, $r_{j_{0} k}^{″}$ is calculated based on D″ which is the highest correlated dataset of d″, and D′ ≠ D″ generally holds. Hence, the membership degree is dynamic or variable when calculating the JDDMD.

Example 2. Using Table 2 and the JDDMD, we construct the fuzzy relational matrix R for the case d′= (3,2,3,3).

First, the JDCNs are calculated by Equation (2), and they are shown in Table 6.

Table 6

Table 2 with the JDCN

TID	a ₁	a ₂	a ₃	a ₄	Class	JDCN
1	1	1	1	2	2	0
2	1	1	1	3	2	1
3	1	2	1	2	1	1
4	1	2	1	3	2	2
5	1	3	3	1	1	1
6	1	3	3	2	2	1
7	2	1	1	3	2	1
8	2	1	2	1	1	0
9	2	1	2	2	2	0
10	2	2	2	2	1	1
11	2	2	3	2	2	2
12	2	2	3	3	2	3
13	2	3	3	2	1	1
14	3	1	1	3	1	2
15	3	1	2	2	2	1
16	3	1	2	3	2	2
17	3	2	2	3	1	3
18	3	2	3	2	1	3

* TID: transaction id.

So, the highest JDCN θ=max{0,1,2,3} = 3. Then, we have T = {12,17,18}. Therefore, the highest correlated dataset D′ is (seen in Table 7)

Table 7

The highest correlated dataset

TID	a ₁	a ₂	a ₃	a ₄	Class
12	2	2	3	3	2
17	3	2	2	3	1
18	3	2	3	2	1

* TID: transaction id.

Then, according to D′, d′= (3,2,3,3) and Equation (1), we obtain the fuzzy relational matrix $\begin{matrix} R = [\begin{matrix} 1 & 0 \\ 0.67 & 0.33 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}] \end{matrix}$

3.1.2 The construction of R in a continuous dataset

For a continuous dataset, we use discretization techniques to convert it to a discrete dataset, and then R can be constructed by the method in Section 3.1.1. It should be noted that each continuous attribute needs to be discretized, and in this study, we let they have the same discretization class number.

In this study, the equidistant partition, which divides continuous values into several finite intervals and each interval has the same length, is adopted to discretize. Thus, our discretization process can be described as follows.

We assume that a_{j
₀} is a continuous attribute and d_{ij
₀} is its attribute value (i = 1,2,⋯,|D|). Let $\bar{λ} = \max {d_{{ij}_{0}} | i = 1, 2, \dots, | D |}$ , $\underset{}{λ} = \min {d_{{ij}_{0}} | i = 1, 2, \dots, | D |}$ , and k be the discretization class number; then, we can construct k intervals $\begin{matrix} I_{1} = [\underline{λ}, \underline{λ} + d), \\ I_{2} = [\underline{λ} + d, \underline{λ} + 2 d), \\ \dots \\ I_{k} = [\underline{λ} + (k - 1) d, \bar{λ}] . \end{matrix}$ in which $d = \frac{\bar{λ} - \underline{λ}}{k}$ . Finally, the discretization result is obtained by the rule: if d_{ij
₀} ∈ I_p, then d_{ij
₀} = p (i = 1,2,⋯,|D|; p = 1,2,⋯,k).

Next, let us discuss the determination of the discretization class number k. Since this research is devoted to the classification problem, apparently we can experiment with several values of k and then select the one with the highest classification accuracy as the optimal discretization class number. In this paper, we experiment k = 2, 3, 4, 5, 6, 7, 8, 9, and10. Hence, the optimal discretization class number $\bar{k}$ is

$\bar{k} = argmax {CA (k) | k = 2, 3, 4, 5, 6, 7, 8, 9, 10}$ (5) where CA(k) is the classification accuracy when the discretization class number is k. In particular, if $\bar{k}$ is not unique, then we can randomly select one from it as the final discretization class number.

Because $\bar{k}$ is related to classification accuracy, the determination of the discretization class number will be illustrated by Example 5 which is in Section 3.4; and the following Example 3 shows the construction of R in a continuous daset.

Example 3. Table 8 shows a continuous classification dataset with ten cases, two attributes, and two class labels. Considering the case d′= (5.1,3.1), we construct its fuzzy relational matrix R.

Table 8

A continuous classification dataset

TID	a ₁	a ₂	Class
1	4.9	3	1
2	4.7	3.2	1
3	5.1	3.5	1
4	4.6	3.6	1
5	5.7	3.8	1
6	5.5	2.3	2
7	6.3	2.5	2
8	5.5	2.6	2
9	6.5	2.8	2
10	6.9	3.1	1

* TID: transaction id.

For the attribute a₁, we have $\bar{λ}$ =6.9 and $\underline{λ}$ = 4.6. Assume the discretization class number is 3 (i.e., k = 3), then we get $d = \frac{6.9 - 4.6}{3}$ =0.77. Hence, I₁=[4.6,5.37), and I3=6.13,6.9]. As d₁₁ = 4.9 ∈ I₁, its discretization result is 1, and as d₂₁ = 4.7 ∈ I₁, its discretization result is 1. The same to other attribute values.

For the attribute a₂, we have $\bar{λ}$ =3.8 and $\underline{λ}$ = 2.3. Walso use k = 3, and then we get $d = \frac{3.8 - 2.3}{3}$ =0.5. Therefore, I₁=[2.3,2.8), and I3=3.3,3.8]. As d₁₂ = 3 ∈ I₂, its discretization result is 2, and as d₂₂ = 3.2 ∈ I₂, its discretization result is 2. The same to other attribute values.

The final discretization result is shown in Table 9.

Table 9

The discretization result of Table 8

TID	a ₁	a ₂	Class
1	1	2	1
2	1	2	1
3	1	3	1
4	1	3	1
5	2	3	1
6	2	1	2
7	3	1	2
8	2	1	2
9	3	2	2
10	3	2	1

*TID: transaction id.

For the given case d′=(5.1,3.1), according to the above I₁, I₂ and I₃, its discretization result is d′=(1,2). Then, its JDCNs can be calculated by Equation (2), and they are shown in Table 10.

Table 10

Table 9 with the JDCN

TID	a ₁	a ₂	Class	JDCN
1	1	2	1	2
2	1	2	1	2
3	1	3	1	1
4	1	3	1	1
5	2	3	1	0
6	2	1	2	0
7	3	1	2	0
8	2	1	2	0
9	3	2	2	1
10	3	2	1	1

*TID: transaction id.

Therefore, the highest JDCN θ=max{0,1,2} = 2. Then, we have T = {1,2}. So, the highest correlated dataset D′ is (seen in Table 11)

Table 11

The highest correlated dataset

TID	a ₁	a ₂	Class
1	1	2	1
2	1	2	1

* TID: transaction id.

Finally, according to Equation (1), the fuzzy relational matrix is calculated to be $\begin{matrix} R = [\begin{matrix} 1 & 0 \\ 1 & 0 \end{matrix}] \end{matrix}$

Note 2. As to the influence of the discretization class number k on the proposed FCE-JDDMD classifier, we use six real continuous datasets to inquire (for the introduction of these datasets, please refer to Section 4). Figure 3 reveals how classification accuracy varies by changing k. By observing the curve, we find that, except when k = 2, there is not much change in accuracy overall. This implies that the discretization class number k is generally insensitive to accuracy.

Fig. 3

The influence of the discretization class number k on the proposed FCE-JDDMD classifier.

3.1.3 The construction of R in a combined dataset

Following the approach used for the continuous dataset, we employ the equidistant discretization to convert a combined dataset to a discrete dataset. For details, please refer to Section 3.1.2.

3.2 The construction of W

Weight reflects the importance of an attribute. In this study, we choose equal weight to constitute the weight vector W. That is, W=( $\frac{1}{n}, \frac{1}{n}, \dots, \frac{1}{n}$ ) where n is the number of attributes.

Before deciding to use equal weight, we also attempt another two types of weight: optimization weight and entropy weight. However, we eventually found that they were all not appropriate. a) We form an optimization function that maximizes classification accuracy of the training set to calculate the optimization weight. However, the solution of this optimization function is generally not unique, and the solutions may lead to different classification results for a given test case, which is unacceptable in the classification problem. So we give up the optimization weight. b) When we consider using entropy as weight, dynamic entropy corresponding to the JDDMD arises, and this evidently will lead to more calculation. Since equal weight is simple and does not require additional calculation, we finally choose it as the weight of the attribute.

3.3 Model implementation

In a classification problem, the dataset is first divided into two subsets, a training set and a test set. For each case of the test set, according to Section 3.1 and Section 3.2, we can construct its fuzzy relational matrix R and weight vector W based on the training set. Then, the FCE model B = W ∘ R can be implemented, and the evaluation result (i.e., the predicted class label) is obtained. The steps of the proposed FCE-JDDMD are summarized in Algorithm 1.

Algorithm 1: The FCE-JDDMD Algorithm
Input: Dataset D, the cross validation fold number k_f.
Output: Classification accuracy y
1: Count the number of attributes n and the number of class labels m in D.
2: If D is discrete, then go to the next step; if not, apply the equidistant partition to discretize (for the discretization class number, see Section 3.1.2).
3: Randomly divide D into a training set D_train and a test set D_test according to k_f, then count their numbers of cases (denoted as n_r and n_e, respectively).
4: forq = 1: k_fdo
5: fori = 1: n_edo
6: forz = 1: n_rdo
7: forl = 1: ndo
8: Calculate the JDCN by Equation (2).
9: end for
10: end for
11: Choose the with the highest JDCN from D_train to
build the highest correlated dataset $D_{train}^{'}$ .
12: Calculate the membership degree r_jk according
to $D_{train}^{'}$ and Equation (1), then obtain the fuzzy
relational matrix R=(r_jk) _n×m.
13: Let W=( $\frac{1}{n}, \frac{1}{n}, \dots, \frac{1}{n}$ ).
14: Implement the FCE model B = W ∘ R, then get the
predicted class label by the principle of maximum
membership.
15: end for
16: Count the number of correct predictions, then obtain
classification accuracy acc_(q).
17: end for
18: y= $\frac{{acc}_{(1)} + {acc}_{(2)} + \dots + {acc}_{(k_{f})}}{k_{f}}$
19: returny

For a test case, the time complexity of the FCE-JDDMD is O(|D|mn) where |D| is the number of cases in the dataset D, m is the number of class labels and n is the number of attributes. In addition, there are no parameters in the proposed classifier for a discrete dataset, and there is only one parameter for a continuous dataset or a combined dataset (i.e., the discretization class number). Thus, the proposed FCE-JDDMD classifier is simple.

3.4 Examples

In this section, we design two examples (a discrete classification problem and a continuous classification problem) to illustrate the proposed classifier. About the operator ∘ in the FCE model, this paper uses the weighted average operator, that is $b_{j} = \sum_{i = 1}^{n} w_{i} r_{ij}$ , j = 1,2,⋯,m.

Example 4. Table 12 is a discrete classification dataset with twenty cases, four attributes, and two class labels. It is divided into two subsets, a training set and a test set, at a 9:1 ratio, where the TIDs of the test cases are 19 and 20, and the others are the training cases. Using the proposed FCE-JDDMD, we classify the test cases.

Table 12
Discrete training data and test data

TID a ₁ a ₂ a ₃ a ₄ Class Training/Test

1 1 1 1 2 2 Training

2 1 1 1 3 2 Training

3 1 2 1 2 1 Training

4 1 2 1 3 2 Training

5 1 3 3 1 1 Training

6 1 3 3 2 2 Training

7 2 1 1 3 2 Training

8 2 1 2 1 1 Training

9 2 1 2 2 2 Training

10 2 2 2 2 1 Training

11 2 2 3 2 2 Training

12 2 2 3 3 2 Training

13 2 3 3 2 1 Training

14 3 1 1 3 1 Training

15 3 1 2 2 2 Training

16 3 1 2 3 2 Training

17 3 2 2 3 1 Training

18 3 2 3 2 1 Training

19 3 2 3 3 1 Test

20 2 1 1 2 2 Test

TID	a ₁	a ₂	a ₃	a ₄	Class	Training/Test
1	1	1	1	2	2	Training
2	1	1	1	3	2	Training
3	1	2	1	2	1	Training
4	1	2	1	3	2	Training
5	1	3	3	1	1	Training
6	1	3	3	2	2	Training
7	2	1	1	3	2	Training
8	2	1	2	1	1	Training
9	2	1	2	2	2	Training
10	2	2	2	2	1	Training
11	2	2	3	2	2	Training
12	2	2	3	3	2	Training
13	2	3	3	2	1	Training
14	3	1	1	3	1	Training
15	3	1	2	2	2	Training
16	3	1	2	3	2	Training
17	3	2	2	3	1	Training
18	3	2	3	2	1	Training
19	3	2	3	3	1	Test
20	2	1	1	2	2	Test

* TID: transaction id.

Seeing Table 12 from the perspective of the FCE,

we have the factor set U = {a₁, a₂, a₃, a₄} and the decision set V = {1,2}.

For the test cases d′=(3,2,3,3) and d″=(2,1,1,2), according to the calculation of the JDDMD, their fuzzy relational matrices are respectively calculated to be $\begin{matrix} R^{'} = [\begin{matrix} 1 & 0 \\ 0.67 & 0.33 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}] \end{matrix}$

and $\begin{matrix} R^{″} = [\begin{matrix} 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \end{matrix}] . \end{matrix}$

By the construction of W in Section 3.2, we obtain the weight vector is

W = (0.25,0.25,0.25,0.25).

Then, we input R′, R″, and W into the FCE model B = W ∘ R, and we get $\begin{matrix} B^{'} = W \circ R^{'} = (0.25, 0.25, 0.25, 0.25) \\ \circ [\begin{matrix} 1 & 0 \\ 0.67 & 0.33 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}] = (0.67, 0.33) \end{matrix}$ $\begin{matrix} B^{″} = W \circ R^{″} = (0.25, 0.25, 0.25, 0.25) \\ \circ [\begin{matrix} 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \end{matrix}] = (0, 1) . \end{matrix}$

As 0.67 > 0.33 and 1 > 0, the evaluation results are class 1 and class 2, respectively, according to the principle of maximum membership. That is, the predicted class label is class 1 for the test case TID = 19, and it is class 2 for TID = 20.

Example 5. Table 13 is a continuous classification dataset with twelve cases, two attributes, and two class labels. It is divided into two subsets, a training set and a test set, at a 5:1 ratio, where the TIDs of the test cases are 11 and 12, and the others are the training cases. Using the proposed FCE-JDDMD, we classify the test cases.

Table 13

Continuous training data and test data

TID	a ₁	a ₂	Class	Training/Test
1	4.9	3	1	Training
2	4.7	3.2	1	Training
3	5.1	3.5	1	Training
4	4.6	3.6	1	Training
5	5.7	3.8	1	Training
6	5.5	2.3	2	Training
7	6.3	2.5	2	Training
8	5.5	2.6	2	Training
9	6.5	2.8	2	Training
10	6.9	3.1	1	Training
11	5.1	3.1	1	Test
12	6	2.9	2	Test

* TID: transaction id.

Observing Table 13 from the FCE point of view, we have the factor set U = {a₁, a₂} and the decision set V = {1,2}.

We assume the discretization class number is 3 (i.e., k = 3). Then, for the test cases d′=(5.1,3.1) and d″=(6,2.9), according to the equidistant discretization and the calculation of the JDDMD, their fuzzy relational matrices are respectively calculated to be (seen in Example 3) $\begin{matrix} R^{'} = [\begin{matrix} 1 & 0 \\ 1 & 0 \end{matrix}] \end{matrix}$ and $\begin{matrix} R^{″} = [\begin{matrix} 0.33 & 0.67 \\ 0.75 & 0.25 \end{matrix}] . \end{matrix}$

According to the construction of W in Section 3.2, the weight vector becomes

W = (0.5,0.5).

By inputting R′, R″, and W into the FCE model, we obtain $\begin{matrix} B^{'} = W \circ R^{'} = (0.5, 0.5) \circ [\begin{matrix} 1 & 0 \\ 1 & 0 \end{matrix}] = (1, 0), \end{matrix}$ $\begin{matrix} B^{″} = W \circ R^{″} = (0.5, 0.5) \circ [\begin{matrix} 0.33 & 0.67 \\ 0.75 & 0.25 \end{matrix}] \\ = (0.54, 0.46) . \end{matrix}$

Thus, when k = 3, the predicted class labels are all class 1. So, CA(3) = 50%.

The same to k = 2, 4, 5, 6, 7, 8, 9, and 10, we finally get

CA(2) = CA(4) = CA(7) = CA(9) = 100%,

CA(5) = CA(6) = CA(8) = CA(10) = 50%.

According to Equation (5), we have the optimal discretization class number $\bar{k}$ = 2, 4, 7, and 9. So, we can select k = 2 as the final discretization class number, and at this point, the predicted class labels are class 1 and class 2, respectively.

4 Experiments

In this section, we empirically evaluate our proposed classifier icomparison with some other classifiers.

4.1 Experimental setting

The experiments are set as follows.

Data. The experiments are implemented on 12 varied datasets, which are from the University of California at Irvine (UCI) Machine Learning Repository (http://archive.ics.uci.edu/ml/index.php) and are commonly used in classification studies [30 –33]. These datasets are selected to be diverse, ranging from 106 to 1484 cases, 4 to 13 attributes, and 2 to 10 class labels. Table 14 presents a summary of these datasets, arranged first by attribute characteristic (descending) and then by dataset name (ascending).

Table 14
Twelve UCI datasets used in the experiments

Dataset Attribute Characteristic Cases Attributes Classes

Balance discrete 625 4 3

Breast-w discrete 699 9 2

Monks1 discrete 556 6 2

Monks2 discrete 601 6 2

Monks3 discrete 554 6 2

Tic-tac-toe discrete 958 9 2

Breast Tissue continuous 106 9 6

Ecoli continuous 336 7 8

Iris continuous 150 4 3

Pima continuous 768 8 2

Wine continuous 178 13 3

Yeast continuous 1484 8 10

Dataset	Attribute Characteristic	Cases	Attributes	Classes
Balance	discrete	625	4	3
Breast-w	discrete	699	9	2
Monks1	discrete	556	6	2
Monks2	discrete	601	6	2
Monks3	discrete	554	6	2
Tic-tac-toe	discrete	958	9	2
Breast Tissue	continuous	106	9	6
Ecoli	continuous	336	7	8
Iris	continuous	150	4	3
Pima	continuous	768	8	2
Wine	continuous	178	13	3
Yeast	continuous	1484	8	10

*The number of attributes does not include the sequence name of the dataset.

Cross validation. 10-fold cross validation, which can fully utilize data and ensure that classification accuracy is reliable, is used in our experiments. That is, during the experiment, the dataset is partitioned into two sets, where 9-fold is used to generate the membership degree for the remaining 1-fold.

Compared classifier. We use XGBoost, AdaBoost, SVM and five classifiers in [30] for comparison; and these five classifiers are RIPPER, classification based on associations (CBA), multi-class classification based on association rules (MCAR), predictability-based collective class association rules (PCAR), and PCAR2.

Evaluation criteria. We evaluate the effectiveness of our proposed classifier based on classification accuracy.

4.2 Experimental results and discussion

The experimental results are presented in Table 15, where the boldface entries indicate the best values. In Table 15, XGBoost is from https://github.com/dmlc/xgboost, while AdaBoost and SVM are from the machine learning toolkit scikit-learn. The maximum numbers of estimators for both XGBoost and AdaBoost are set to 50. For SVM, the parameter C = 1.0 and the kernel function uses the RBF kernel. Other relevant parameters use the default settings of their interfaces. In addition, the classification accuracy of RIPPER, CBA, MCAR, PCAR, and PCAR2 is from reference [30].

Table 15
Classification accuracy (%) and rank of the proposed classifier against others

Dataset $\bar{k}$ FCE-JDDMD (Proposed) XGBoost AdaBoost SVM RIPPER CBA MCAR PCAR PCAR2

Balance / 86.06 (4) 87.53 (3) 91.21 (1) 90.40 (2) 72.32 (8) 66.08 (9) 75.20 (7) 75.52 (6) 76.48 (5)

Breast-w / 95.86 (2) 95.71 (3) 95.27 (5) 96.42 (1) 94.29 (8) 89.70 (9) 94.85 (7) 95.57 (4) 95.14 (6)

Monks1 / 100.00 (3.5) 100.00 (3.5) 74.30 (9) 90.99 (7) 89.39 (8) 100.00 (3.5) 100.00 (3.5) 100.00 (3.5) 100.00 (3.5)

Monks2 / 78.19 (2) 99.33 (1) 58.55 (9) 73.20 (4) 60.90 (8) 63.23 (7) 66.39 (6) 69.05 (5) 74.38 (3)

Monks3 / 97.47 (7) 98.56 (5) 95.12 (9) 96.39 (8) 98.38 (6) 98.74 (2.5) 98.74 (2.5) 98.74 (2.5) 98.74 (2.5)

Tic-tac-toe / 98.96 (5) 98.12 (6) 78.39 (9) 94.05 (8) 97.81 (7) 100.00 (2.5) 100.00 (2.5) 100.00 (2.5) 100.00 (2.5)

Breast Tissue 5 64.00 (4) 62.82 (6) 59.36 (8) 49.73 (9) 63.21 (5) 62.26 (7) 69.81 (3) 70.75 (2) 74.53 (1)

Ecoli 4 80.88 (4) 83.90 (2) 64.03 (9) 86.60 (1) 81.25 (3) 75.60 (7) 75.89 (5.5) 75.89 (5.5) 75.30 (8)

Iris 3 96.67 (1) 95.33 (3.5) 94.00 (5.5) 96.00 (2) 95.33 (3.5) 94.00 (5.5) 93.33 (7.5) 92.67 (9) 93.33 (7.5)

Pima 3 73.97 (8) 74.47 (7) 73.83 (9) 77.08 (2) 75.13 (5) 77.47 (1) 76.69 (3) 76.17 (4) 75.00 (6)

Wine 4 97.19 (4) 95.56 (6) 93.69 (8) 98.33 (1) 95.51 (7) 88.20 (9) 97.19 (4) 97.75 (2) 97.19 (4)

Yeast 6 53.09 (6) 59.30 (2) 37.08 (9) 59.64 (1) 58.42 (3) 53.71 (4) 52.96 (7) 53.64 (5) 51.42 (8)

Average accuracy 85.20 (2) 87.55 (1) 76.24 (9) 84.07 (4) 81.83 (7) 80.75 (8) 83.42 (6) 83.81 (5) 84.29 (3)

Average rank 4.21 4.00 7.54 3.83 5.96 5.58 4.88 4.25 4.75

Dataset	$\bar{k}$	FCE-JDDMD (Proposed)	XGBoost	AdaBoost	SVM	RIPPER	CBA	MCAR	PCAR	PCAR2
Balance	/	86.06 (4)	87.53 (3)	91.21 (1)	90.40 (2)	72.32 (8)	66.08 (9)	75.20 (7)	75.52 (6)	76.48 (5)
Breast-w	/	95.86 (2)	95.71 (3)	95.27 (5)	96.42 (1)	94.29 (8)	89.70 (9)	94.85 (7)	95.57 (4)	95.14 (6)
Monks1	/	100.00 (3.5)	100.00 (3.5)	74.30 (9)	90.99 (7)	89.39 (8)	100.00 (3.5)	100.00 (3.5)	100.00 (3.5)	100.00 (3.5)
Monks2	/	78.19 (2)	99.33 (1)	58.55 (9)	73.20 (4)	60.90 (8)	63.23 (7)	66.39 (6)	69.05 (5)	74.38 (3)
Monks3	/	97.47 (7)	98.56 (5)	95.12 (9)	96.39 (8)	98.38 (6)	98.74 (2.5)	98.74 (2.5)	98.74 (2.5)	98.74 (2.5)
Tic-tac-toe	/	98.96 (5)	98.12 (6)	78.39 (9)	94.05 (8)	97.81 (7)	100.00 (2.5)	100.00 (2.5)	100.00 (2.5)	100.00 (2.5)
Breast Tissue	5	64.00 (4)	62.82 (6)	59.36 (8)	49.73 (9)	63.21 (5)	62.26 (7)	69.81 (3)	70.75 (2)	74.53 (1)
Ecoli	4	80.88 (4)	83.90 (2)	64.03 (9)	86.60 (1)	81.25 (3)	75.60 (7)	75.89 (5.5)	75.89 (5.5)	75.30 (8)
Iris	3	96.67 (1)	95.33 (3.5)	94.00 (5.5)	96.00 (2)	95.33 (3.5)	94.00 (5.5)	93.33 (7.5)	92.67 (9)	93.33 (7.5)
Pima	3	73.97 (8)	74.47 (7)	73.83 (9)	77.08 (2)	75.13 (5)	77.47 (1)	76.69 (3)	76.17 (4)	75.00 (6)
Wine	4	97.19 (4)	95.56 (6)	93.69 (8)	98.33 (1)	95.51 (7)	88.20 (9)	97.19 (4)	97.75 (2)	97.19 (4)
Yeast	6	53.09 (6)	59.30 (2)	37.08 (9)	59.64 (1)	58.42 (3)	53.71 (4)	52.96 (7)	53.64 (5)	51.42 (8)
Average accuracy		85.20 (2)	87.55 (1)	76.24 (9)	84.07 (4)	81.83 (7)	80.75 (8)	83.42 (6)	83.81 (5)	84.29 (3)
Average rank		4.21	4.00	7.54	3.83	5.96	5.58	4.88	4.25	4.75

* $\bar{k}$ is the optimal discretization class number. Rank is indicated by the number inside the parenthesis. The rank of 1 is given for the best value (i.e., the highest accuracy) and 9 for the worst, and in case of a tie (such as in Monks1, Monks3, Tic-tac-toe, Ecoli, Iris, and Wine), mean rank is assigned. Average rank is utilized in the Friedman test for statistical analysis.

The effectiveness of the proposed classifier is discussed in two aspects: average accuracy and average rank.

(1) Average accuracy

As shown in Table 15, the FCE-JDDMD has the highest accuracy on 2 datasets (Monks1 and Iris) and has relatively high accuracy on 6 datasets where its rank is 2 or 4 (Balance, Breast-w, Monks2, Breast Tissue, Ecoli, and Wine). In particular, although the FCE-JDDMD ranks 5 on Tic-tac-toe, it actually has the second highest accuracy. In addition, the accuracy is compact on Monks3; although the accuracy of our classifier is relatively low, the accuracy gap between our classifier and the best classifier is only 98.74–97.47% = 1.27%. Overall, the average accuracy of the FCE-JDDMD is 85.20 which is the second highest, and it is lower than that of XGBoost and higher than that of the other classifiers. Therefore, the proposed classifier outperforms some existing classifiers in terms of classification accuracy.

In addition, although the average accuracy of the FCE-JDDMD is lower than that of XGBoost, since the FCE-JDDMD has at most one parameter as stated in Section 3.3, it is simpler than XGBoost.

(2) Average rank

To conduct a fair comparison, the Friedman test for average ranks is performed [31, 34]. This test is a non-parametric equivalent of the repeated-measures ANOVA (analysis of variance) and was proposed by Milton Friedman in 1937 [35, 36].

During the Friedman test, we first need to rank the classifiers for each dataset. That is, the best performing classifier obtains the rank of 1, the second best obtains the rank of 2, and so on. Particularly, when two classifiers have the same performance, mean rank is assigned. We denote π_ij as the rank of the j-th of M classifiers on the i-th of N datasets. Under the null-hypothesis, which states that all the classifiers are equivalent (that is, their average ranks $π_{j} = \frac{1}{N} \sum_{i = 1}^{N} π_{ij}$ (j = 1,2,⋯,M) should be equal), the improved Friedman statistic

$F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (M - 1) - χ_{F}^{2}}$ (6) in which

$χ_{F}^{2} = \frac{12 N}{M (M + 1)} [\sum_{j = 1}^{M} π_{j}^{2} - \frac{M (M + 1)^{2}}{4}]$ (7) is distributed according to the F-distribution with M–1 and (M–1)(N–1) degrees of freedom [34, 37]. The Friedman test checks whether the measured average ranks π_j (j = 1,2,⋯,M) are significantly different from the mean ranks $\sum_{j = 1}^{M} j / - M$ anticipated under the null-hypothesis.

From the average ranks in Table 15, we can see that the FCE-JDDMD, XGBoost, SVM and PCAR rank fourth (with ranks 4.21, 4.00, 3.83 and 4.25, respectively), MCAR and PCAR2 rank fifth (with ranks 4.88 and 4.75, respectively), RIPPER and CBA rank sixth (with ranks 5.96 and 5.58, respectively), and AdaBoost ranks eighth (with rank 7.54). That is, the FCE-JDDMD, XGBoost, SVM and PCAR are superior to the other classifiers. Next, we will test whether this conclusion is credible.

According to Equations (6) and (7), we have

$\begin{matrix} \begin{matrix} χ_{F}^{2} = \frac{12 \times 12}{9 \times (9 + 1)} \\ = 18.1472 \end{matrix} \end{matrix}$ $\begin{matrix} F_{F} = \frac{(12 - 1) \times 18.1472}{12 \times (9 - 1) - 18.1472} = 2.56 \end{matrix}$

With 9 classifiers and 12 datasets, F_F is distributed according to the F-distribution with 9–1 = 8 and (9–1)×(12–1) = 88 degrees of freedom. Upon consultation, we know that the critical value of F(8,88) is 1.74 for α = 0.1. Due to F_F > F_0.1 (8, 88) (8,88), we reject the null-hypothesis. This means that these 9 classifiers are not equivalent and the above conclusion is credible. That is, the proposed FCE-JDDMD, XGBoost, SVM and PCAR indeed perform better than the other classifiers.

In conclusion, the proposed FCE-JDDMD has the second highest average accuracy and a headmost average rank. Thus, it is an outperforming classifier.

5 Conclusions

This paper puts forward a new classification method based on the FCE. The underlying idea is to regard the class labels in classification problems as the evaluation remarks in the FCE and regard their attributes to be consistent. So, the classification principle of the proposed FCE-JDDMD classifier is to provide the most probable class label to the object by comprehensively considering the membership degrees of every attribute to class labels. In the proposed classifier, for a discrete classification dataset, the JDDMD is put forward to construct the fuzzy relational matrix and equal weight is considered to constitute the weight vector, then the FCE model is executed to obtain the classification result; for a continuous classification dataset, the equidistant discretization is employed to convert it to a discrete dataset, and the determination of the discretization class number is also discussed in this study. Finally, we empirically demonstrate, on a variety of datasets, that the proposed classifier outperforms some existing classifiers.

The proposed FCE-JDDMD classifier not only has good interpretability but is also easy to operate. Therefore, from the perspective of the FCE, this study extensively extends the application of the FCE, such that the FCE innovatively advances in data mining; from the classification viewpoint, this study proposes a novel, simple and outperforming classification method.

Despite the aforementioned advantages, the proposed FCE-JDDMD has several limitations. For one thing, this study adopts equal weight as the attribute weights, which is subjective weight. In future work, we will attempt to design a learning algorithm to calculate the weight to improve the FCE-JDDMD. For another, in this study, we let each attribute have the same discretization class number for a given continuous dataset, and the work that assigning different discretization class numbers to attributes is a worthwhile future research.

References

Zadeh

L.A.

, Fuzzy sets, Information and Control 8 (1965), 338–353.

Wang

P.Z.

, Introduction to fuzzy mathematics, Mathematics in Practice and Theory 2 (1980), 45–59.

Liang

D.C.

, Dai

Z.Y.

and Wang

M.W.

, Assessing customer satisfaction of O2O takeaway based on online reviews by integrating fuzzy comprehensive evaluation with AHP and probabilistic linguistic term sets, Applied Soft Computing 98 (2021), 106847.

Chen

J.F.

, Hsieh

H.N.

and Do

Q.H.

, Evaluating teaching performance based on fuzzy AHP and comprehensive evaluation approach, Applied Soft Computing 28 (2015), 100–108.

Wang

Z.H.

, Fuzzy comprehensive evaluation of physical education based on high dimensional data mining, Journal of Intelligent & Fuzzy Systems 35 (2018), 3065–3076.

Chang

N.B.

, Chen

H.W.

and Ning

S.K.

, Identification of river water quality using the Fuzzy Synthetic Evaluation approach, Journal of Environmental Management 63 (2001), 293–305.

Zhong

C.H.

, Yang

Q.C.

, Liang

and Ma

H.Y.

, Fuzzy comprehensive evaluation with AHP and entropy methods and health risk assessment of groundwater in Yinchuan Basin, northwest China, Environmental Research 204 (2022), 111956.

Feng

and Xu

L.D.

, Decision support for fuzzy comprehensive evaluation of urban development, Fuzzy Sets and Systems 105 (1999), 1–12.

Kuo

Y.F.

and Chen

P.C.

, Selection of mobile value-added services for system operators using fuzzy synthetic evaluation, Expert Systems with Applications 30 (2006), 612–620.

10.

Pei

Y.L.

, Qi

W.W.

, Zhang

X.P.

and Song

, Comprehensive analysis of risky driving behaviors based on fuzzy evaluation model, Advances in Intelligent and Soft Computing 160 (2012), 507–512.

11.

Yang

, Ji

Z.Y.

, Zhang

A.W.

and Xia

Y.M.

, A hybrid comprehensive performance evaluation approach of cutter holder for tunnel boring machine, Advanced Engineering Informatics 52 (2022), 101546.

12.

, Wang

E.Y.

, Shang

, Xu

, Ali

, Wang

, Wu

and Niu

, Quantification study of working fatigue state affected by coal mine noise exposure based on fuzzy comprehensive evaluation, Safety Science 146 (2022), 105577.

13.

Luo

C.Z.

Introduction to Fuzzy Sets, Beijing Normal University Press, 2005.

14.

Wei

, Luo

X.F.

, Li

, Zhang

and Xu

, Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map, IEEE Transactions on Fuzzy Systems 23 (2015), 72–84.

15.

X.Y.

, Tao

L.F.

, Liu

H.M.

, Wang

L.Z.

and Suo

M.L.

, Real-time hierarchical risk assessment for UAVs based on recurrent fusion autoencoder and dynamic FCE: A hybrid framework, Applied Soft Computing 106 (2021), 107286.

16.

X.Y.

, Yu

F.S.

, Pedrycz

and Du

X.B.

, Multi-source fuzzy comprehensive evaluation, Applied Soft Computing 135 (2023), 110042.

17.

Zhang

X.H.

and Feng

Y.J.

, A nonlinear fuzzy comprehensive assessment model, The System Engineering Theory and Practice 10 (2005), 54–59.

18.

, Wang

Z.Q.

, Lin

and Zhang

Q.M.

, Nonlinear fuzzy comprehensive evaluation of PC building quality based on order relation method, Value Engineering 2 (2017), 101–104.

19.

X.Y.

, Xiao

C.Y.

and Wu

, A linear fuzzy comprehensive assessment model with prominent impact factor, Advances in Intelligent and Soft Computing 78 (2010), 487–492.

20.

Y.N.

and Zhou

J.L.

, Risk assessment of urban rooftop distributed PV in energy performance contracting (EPC) projects: An extended HFLTS-DEMATEL fuzzy synthetic evaluation analysis, Sustainable Cities and Society 47 (2019), 101524.

21.

Yang

J.H.

, Ouyang

, Shi

Y.L.

, Huang

R.Y.

and Liu

Z.W.

, Combined membership function and its application on fuzzy evaluation of power quality, Advanced Technology of Electrical Engineering and Energy 33 (2014), 63–69.

22.

Liu

D.J.

and Zou

Z.H.

, Water quality evaluation based on improved fuzzy matter-element method, Journal of Environmental Sciences 24 (2012), 1210–1216.

23.

Rezaei

, Best-worst multi-criteria decision-making method, Omega 53 (2015), 49–57.

24.

Chiao

K.P.

, The multi-criteria group decision making methodology using type 2 fuzzy linguistic judgments, Applied Soft Computing 49 (2016), 189–211.

25.

, Chen

Z.L.

, Liu

and Lu

H.Q

, Construction and application of formula-grey fuzzy evaluation, Journal of Computer Applications 38 (2018), 34–37.

26.

Wang

and Dong

X.L.

, Improvement of the principle of maximum membership in fuzzy assessment, Engineering Management 2 (2011), 27–28.

27.

Qin

X.L.

, Evaluation of English intercultural communication ability based on machine learning and fuzzy mathematics, Journal of Intelligent & Fuzzy Systems 40 (2021), 7259–7271.

28.

Zhang

N.L.

, The subordination and probability characteristics of random accurance (I), Journal of Wuhan Building Materials University (1) (1981), 11–19.

29.

Tan

P.N.

, Steinbach

, Kumar

Introduction to Data Mining, Posts & Telecom Press, 2006.

30.

Song

and Lee

, Predictability-based collective class association rule mining, Expert Systems With Applications 79 (2017), 1–7.

31.

Yang

, Oh

S.K.

, Pedrycz

, Fu

Z.W.

and Yang

, Design of Reinforced Fuzzy Radial Basis Function Neural Network Classifier Driven with the Aid of Iterative Learning Techniques and Support Vector-based Clustering, IEEE Transactions on Fuzzy Systems 29(9) (2021), 2506–2520.

32.

Proenca

H.M.

and Leeuwen

M.V.

, Interpretable multiclass classification by MDL-based rule lists, Information Sciences 512 (2020), 1372–1393.

33.

Colace

, Loia

, Pedrycz

and Tomasiello

, On a granular functional link network for classification, Neurocomputing 398 (2020), 108–116.

34.

Demsar

, Statistical comons of classifiers over multiple data sets, Journal of Machine learning research 7 (2006), 1–30 paris.

35.

Friedman

, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association 32 (1937), 675–701.

36.

Friedman

, A comparison of alternative tests of significance for the problem of m rankings, Annals of Mathematical Statistics 11 (1940), 86–92.

37.

Iman

R.L.

and Davenport

J.M.

, Approximations of the critical region of the Friedman statistic, Communications in Statistics-Theory and Methods 9(6) (1980), 571–595.

A fuzzy comprehensive evaluation method based on the joint distribution dynamic membership degree for classification problems

Abstract

Keywords

1 Introduction

Table 1 Some cases from the Iris dataset TID Sepal Length Sepal Width Petal Length Petal Width Class 1 5.1 3.5 1.4 0.2 Iris-setosa 2 5.3 3.7 1.5 0.2 Iris-setosa 3 7.0 3.2 4.7 1.4 Iris-versicolour 4 4.9 2.4 3.3 1.0 Iris-versicolour 5 5.9 3.0 5.1 1.8 Iris-virginica

3.1.1 The construction of R in a discrete dataset

3.2 The construction of W

3.3 Model implementation

3.4 Examples

4.1 Experimental setting

References

Table 1
Some cases from the Iris dataset

TID Sepal Length Sepal Width Petal Length Petal Width Class

1 5.1 3.5 1.4 0.2 Iris-setosa

2 5.3 3.7 1.5 0.2 Iris-setosa

3 7.0 3.2 4.7 1.4 Iris-versicolour

4 4.9 2.4 3.3 1.0 Iris-versicolour

5 5.9 3.0 5.1 1.8 Iris-virginica