Sage Journals: Discover world-class research

Abstract

In this paper, we proposed classification and clustering algorithms that are proper for analyzing customer-related datasets, which are mostly high-dimensional with too many instances. For the clustering purpose, This paper presents a Cuckoo-Search-based Variable Weighting (CSVW) Clustering algorithm to obtain optimal variable weights of high-dimensional data for each cluster. This paper also proposes a deep Inferarer Classifier for categorizing customers using Bi-Directional Long Short-Term Memory (Bi-LSTM) neural network, which uses a Fuzzy Inferential Classifier on its last layer. The Insurance Company (TIC) and InstaCart datasets are utilized for the experiments and performance evaluation. Simulation results reveal that the proposed clustering algorithm generates appropriate Silhouette and Elbow criteria scores in a few cycles of execution in comparison to ordinal clustering algorithms. Also, the proposed classification algorithm with fuzzy soft-max classifier hits the better Classification Criteria in comparison.

Keywords

Customer clustering Cuckoo optimization variable-sensitive clustering deep learning

1 Introduction

The customer is considered an essential part of a business market, and no business can be successful without customer satisfaction. Therefore, studying customers from different aspects is necessary to have a successful business. Here, data mining and machine learning algorithms help us to cluster, score or categorize customer-related data for further analysis. Separating customers into separate groups allows companies to predict what products their favorite basket contains and thus encourages the customers to buy more products. This can be used in other businesses such as bankruptcy prediction, financial markets, stock exchange, banking, etc. Identifying potential purchasers is a powerful approach to advertising a market or product. For example, an insurance company often wants to know which customers are willing to buy a particular product [1 –5].

The data relating to the customers of the financial markets mainly have high dimensions. In some cases, the characteristics of this data include customer purchases which are sparse in nature. Clustering such type of data is usually a difficult task. Traditional clustering methods generally do not show good results when faced with such high-dimensional data. With the increase in dimensions, the dispersion of the data becomes very large, and the use of methods based on distance measurement in the entire dimension space becomes meaningless. This is because some data features may be irrelevant or redundant in the clustering process rather than the problem of dependent features. In this case, clusters of high-dimensional data are usually embedded in lower-dimensional subspaces. Thus, clusters should often be searched for in subspaces of dimensions rather than the whole dimension space. Clustering is computationally tricky in such conditions [6 –8].

Using the k-means approach for clustering is another well-known technique that uses an iterative approach to minimize the sum of squared errors. But it supposes the same weight for the entire features in the dataset and ignores the effect of more useful features in the clustering process. Weighted clustering, in contrast, assigns normalized weights to each cluster based on its importance, which makes it more appropriate for clustering high-dimensional data. This type of clustering minimizes an objective function based on the k-means algorithm, iteratively. The weights related to each variable in a cluster normalize the Euclidean within cluster distance value [9–11 , 31]. Considering that finding multiple weights in big data is a np-hard problem, optimization techniques can be used in this case.

In this paper, we proposed a new clustering algorithm by defining variable weights for each feature of the customer dataset, which uses the cuckoo search optimization algorithm in the objective function. In our clustering algorithm, instead of using the conventional local search strategies in the clustering process, we proposes a Cuckoo-Search-based optimizer to obtain optimal variable weights. The proposed clustering technique is also independent of the initial centroid of the clusters. The aim of using Cuckoo-Search optimization algorithm was that this algorithm has performed well in various real-world applications and optimization scenarios and specifically, the following reasons can be considered:

Handling large search space: High-dimensional datasets usually have a large search space, and it can be challenging to explore all the possible combinations of features or variables. Cuckoo search algorithm uses a random mutation procedure, which helps to explore the search space efficiently and effectively.

Finding global optimum: High-dimensional clustering problems often have multiple local optima, and it is necessary to find the global optimum. Cuckoo search algorithm uses a random walk strategy to explore the search space and can jump out of local optima to find the global optimum.

Handling noise and outliers: High dimensional datasets often contain noise and outliers, which can affect the clustering results. Cuckoo search algorithm uses an adaptive step size, which helps to avoid getting stuck in noisy regions or outliers.

This paper will also propose an efficient deep-learning architecture for classifying customers based on the customers’ purchase behavior. The proposed deep learning structure uses a Bi-Directional LSTM learning network [19] to learn the investigated dataset’s time-series behavior. In the Bidirectional LSTM architecture, instead of training a single model, two models are presented that can manage long-term data better than the traditional LSTM. Also, this paper used a Fuzzy Inferential Classifier [20] which classifies LSTM Blocks’ output, fuzzily. The aim of innovating this classifier was that in some cases, the customers’ behaviors related to different classes have no significant difference with each other. The rest of paper is organisd as follow. In Section 2, an overview of the previous work is presented. Section 3 briefly describes the background algorithms used in this paper. In Section 4, the architecture of the proposed method has been expressed, containing the Cuckoo-Search-based variable weighting Clustering algorithm and the proposed Deep Inferrer architecture for classification. Experiments and results are discussed in Section 5. The overall method is also concluded briefly in Section 6.

2 Literature review

There are several researches in the literature which work on the clustering techniques for high-dimensional data. Here we summarize the most recent and impressive papers whith their pros and cons. At the end, we will conclude with the reason for presenting this research. The authors in [9] presented a clustering algorithm that is not sensitive to the initial values of the cluster centers. The presented method uses different weights for each variable in the clustering process. The presented method is a fuzzy extension of the standard k-means clustering algorithm, which can automatically determine the number of clusters by repeatedly calculating the weight of the feature values of the dataset and the membership value of each sample in the clusters. The authors in [7] proposed the Particle Swarm Optimizer (PSO) algorithm to cluster high-dimensional data by assigning different weights to each data set feature. Furthermore, the authors propose a feature-weighted fuzzy clustering algorithm based on multistrategy grey wolf optimization in [10].

Authors in [12, 13] have proposed an innovative model of data clustering. In the investigated data set, various features such as performance, interaction and social interactions are used to cluster players. The results of the presented algorithm have been compared with algorithms such as k-means, SOM and hierarchical clustering methods. Also, the authors in [17, 18] used the combination of K-Means and SOM algorithm to cluster customers based on the RFM model on the insurance industry dataset to identify customer needs and understand their characteristics. To overcome the convergence speed problem of large data set clustering methods, authors in [14, 15] proposed using Swarm-based algorithms such as flower pollination algorithm, black hole algorithm, bat algorithm and the like. The authors in [16] proposed a hybrid method using K-Means-based segmentation to predict customer behavior for future transactions. This research uses methods such as decision tree, clustering, rule extraction, and other such methods. The method presented in this article is not only very effective in identifying the behavior of new customers and predicting their profitability.

Also the authors in [30] proposes a modified cuckoo search algorithm (MCSA) for data clustering, which is an optimization technique used to group similar data points together. MCSA is designed to improve the efficiency and effectiveness of traditional cuckoo search algorithms in solving clustering problems. The experimental results showed that the proposed algorithm outperformed several clustering algorithms in terms of accuracy, clustering quality, and computational time. The paper lacks a detailed discussion of the key differences between traditional cuckoo search algorithms and the modified version proposed in the paper. While, the authors do not provide any theoretical analysis to justify the observed performance improvements. Finally, the paper is limited by its focus on benchmark datasets and the exclusion of real-world datasets. [31] proposes a new possibilistic Fuzzy c-means algorithm that utilizes an improved Cuckoo search method for data clustering. The proposed algorithm uses the concept of possibility theory to handle uncertainty in clustering, which allows the data points to belong to multiple clusters simultaneously. The Cuckoo search algorithm is used to optimize the clustering parameters and improve the convergence speed. Experimental results on several benchmark datasets demonstrate that the proposed algorithm outperforms other state-of-the-art clustering algorithms, The paper does not provide a comparative analysis of the proposed algorithm with other possibilistic clustering algorithms, which limits the scope of the study.

Based on the above reviews, any paper, So far, has investigated the cuckoo search optmaziation algorithm for clustering high-dimensional commercial data that was free of initial data centroids with variable weighting for dimensions. Also, few works have been done regarding the optimal classification of this data, which is referred to in this article

2 Background

2.1 Convolutional neural network Layers

Convolutional neural networks (CNNs) are formed by a sequence of processing layers that are connected through neurons. This structure aims to make a dense feature vector from the raw data. Two layers, Convolutional and Pooling, are usually used to build a CNN-based architecture. These layers are usually stacked placed on top of each other and perform different tasks. The data is input directly to the network, and it followed by several stages of convolution and pooling. The Convolutional layer is the main building CNNs and is responsible for extracting features from the input data. It consists of multiple filters or kernels that slide over the input data and perform element-wise multiplication with the data/features under the filter, producing new feature maps.

The structure of the convolution layer is shown in Fig. 1, where L is the level of the network layer, N_L is the number of feature maps, and f_L() is the activation function of the Lth layer. Also, for the convolution layer L, the size of the mask is defined as R_L×C_L.

Fig. 1

Convolution layer block diagram.

Let $W_{m . n}^{L}$ be the convolution mask corresponding to connecting the mth feature map in layer (L-1) to the nth feature map in layer (L), $b_{n}^{L}$ be the bias value of the nth feature map and $V_{n}^{L}$ as a set of features in layer (L-1) connected to the nth feature map in layer (L). In this case, the feature map n of the convolution layer L is calculated in the form of equation 1. Here f_L refers to the activity function in which Relu is widely used.

$y_{n}^{L} = f_{L} (\sum_{m \in V_{n}^{L}} y_{m}^{L - 1} \times W_{m . n}^{L} + b_{n}^{L})$ (1)

The Pooling layer also performs the dimension reduction operation, whose structure is shown in the level (L) in Fig. 2. The function of this layer is to reduce the spatial size of the input width and height to reduce the number of parameters and calculations in the network and control overfitting. The most widely used dimension reduction operations are max pooling and average pooling operations, which extract the maximum value and the average value of the specified range, respectively. In the pooling layer, the nth feature map from the convolution layer (L-1) is divided into non-overlapping blocks (for example, 2x2 average pooling).

Fig. 2

Pooling Layer block diagram.

2.2 Bi-Directional LSTM

BI-LSTM combines the LSTM algorithm and the bi-directional recurrent neural network (Bi-RNN). Graves and Schmidhuberl first proposed Bi-LSTM for phoneme classification [19]. In B-LSTM (Fig. 3), in the first round, LSTM is applied forward, and in the second round, it is reversed, i.e., backward. Data trained late in the LSTM neural network responds better in the test phase than data trained early in the neural network. Therefore, the data is trained once from beginning to end and from the end to the beginning. This structure allows networks to have back-and-forth information about the sequence at any time. The output of the LSTM blocks then fed into a classifier to classify the input data [19].

Fig. 3

Schematic of the BI-LSTM algorithm.

2.3 Cuckoo search optimization algorithm

Cuckoo Search Optimization Algorithm is inspired by the lifestyle (such egg-laying and reproduction) of a family of birds called Cuckoos. This algorithm has performed well in various real-world applications and optimization scenarios. The initial population of the cuckoo optimization algorithm produced by cuckoo eggs and adult cuckoos. Each egg in each nest represents a solution. This bird has the habit of laying eggs in other birds’ nests. The eggs will develop into adult cuckoos in the host bird’s house if not detected and destroyed by the adult host birds. The adult cuckoos may migrate to the new group, hoping to find an optimal place for reproduction and life. The algorithm aims to replace the not-so-good solutions in the nests with new and possibly better solutions (eggs of cuckoos). Depending on the problem’s complexity, each nest may contain one or several eggs (solutions) [21 –23].

The population of cuckoos is initialized through random initialization of variables regarding the desired optimization problem.

A number of eggs are assigned to each cuckoo. This could be bounded between N_min and N_max.

For each cuckoo, the Egg-Laying Radius (ELR) is specified. Each cuckoo lays eggs within the specified laying radius. The ELR is described as below:

$\begin{matrix} ELR = & α \times \frac{Number of current {cuckoo}^{'} s eggs}{Total number of eggs} \\ \times (va r_{h} - va r_{l}) \end{matrix}$

In the above equation, α is a parameter that controls the maximum value of the spawning radius, var_h represents the upper limit and var_l shows the lower limit of the decision variables.

The host bird destroys % ρ of Cuckoo eggs. This value usually equals ten percent of the laid eggs and includes solutions that cannot converge to the optimal solution.

The habitat of each adult cuckoo is evaluated (the amount of profit is determined). According to the N_max parameter, the cuckoo population in the environment is controlled, and the cuckoos that live in the worst habitats are destroyed.

The cuckoo population is clustered, the average profit of cuckoo groups is calculated and the best habitat is selected as the migration destination of other groups.

The cuckoo population migrates towards the selected optimal habitat. If the population of cuckoos converges to the global optimal point or more than 95% of the population of cuckoos converges to a specific habitat, the execution of the algorithm is stopped; Otherwise, the execution of the algorithm is transferred to step 2 and the algorithm is executed again.

3 Proposed method

As stated, in this paper we proposed classification and clustering algorithms that are proper for analyzing customer-related datasets. For the clustering purpose, This paper proposes a Cuckoo-Search-based variable weighting Clustering algorithm to obtain optimal variable weights of high-dimensional data to each of the clusters. This paper also proposes a deep Inferarer Classifier for categorizing customers using Bi-Directional Long Short-Term Memory (Bi-LSTM) neural network which uses a Fuzzy Inferential Classifier on its last layer. A details description of each of the proposed algorithms is presented as follows.

3.1 Cuckoo-Search-based variable weighting Clustering algorithm

In this section, to optimally weight features in high-dimensional data clustering, we have proposed a Cuckoo-Search-based variable weighting Clustering algorithm called CSVW. To transform the upgraded k-means minimization problem by a cuckoo search optimizer, we used the variable weights matrix in the objective function. In the following, we provide a detailed description of the proposed CSVW algorithm. Our new algorithm uses a weighted k-means objective function, which calculates the sum of the intracluster distances of each cluster with more attention to its associated features. If the feature from the data set is founded to be unrelated to that cluster, a small weight is assigned to it, and as a result, it will not affect the total intra-cluster distance.

On the other hand, features that have a high correlation with a cluster get larger weights. This makes these variables to be more helpful in identifying data objects to their associated cluster. The presented clustering performance largely depends on using a suitable objective function and an efficient search strategy. The objective function determines the quality of the clustering and the search strategy affects the achievement of the optimality of the objective function [7].

3.1.1 Cuckoo Search for variable weighting (CSVW)

Suppose we have dataset X with n data objects and m dimensions which are supposed to be portioned into k clusters. U, Z, and W represent the membership matrix of the data objects to the clusters, the cluster centroids matrix, and the weight matrix associating dataset features to the clusters. Throughout this paper, we adopt the notations in Table 1.

Table 1
The description of notation in the proposed method

Notation Description

k Clusters number

n Data objects number

m dimensions number

ui,l The degree of membership for data object i in cluster l

xi,j Data object i value on dimension j

zl,j The centroid of cluster l on dimension j

wl,j Weight of dimension j on cluster l

U The membership matrix for data objects

Z Cluster centroids matrix

W Weights matrix

Notation	Description
k	Clusters number
n	Data objects number
m	dimensions number
ui,l	The degree of membership for data object i in cluster l
xi,j	Data object i value on dimension j
zl,j	The centroid of cluster l on dimension j
wl,j	Weight of dimension j on cluster l
U	The membership matrix for data objects
Z	Cluster centroids matrix
W	Weights matrix

First, we define the initial cuckoo swarm. Each cuckoo includes a complete weight matrix W_m,k and the Z matrix (which initially holds k different data objects randomly chosen out of all data objects). Each wl,j cell in matrix W corresponds to the weight of dimension l alonge cluster j. For each cluster we initially set wl, j = 1/m related to each dimension. The process iterates by updating U, Z, and W matrices using equations 2–4. The whole process terminates until reaching the maximum iteration or hitting the minimum value for the objective function.

The algorithm details are as below:

Input:

The position swarm W of variable weights are set randomly. Each object in the W swarm represents a complete weight matrix relating dimensions to the clusters.

The swarm Z of cluster centroids is k different data objects randomly chosen out of all data objects. Each object in the Z swarm represents a complete cluster centroid vector which is directly related to the corresponding index of the W matrix.

The Z and W swarms are segmented in multiples of beans as cuckoo eggs in nests. Then we have nests of solutions, a portion of Z and W swarms.

Repeat:

–For each of the eggs in a nest, the U matrix, which holds the cluster membership information of any data objects has been updated using equation (2), based on the current

${\begin{matrix} u_{l, i} = 1 if \sum_{j = 1}^{m} {(\frac{w_{l, i}}{\sum_{j = 1}^{m} w_{l, i}})}^{β} . d (x_{i, j}, z_{l, j}) \leq \sum_{j = 1}^{m} {(\frac{w_{q, i}}{\sum_{j = 1}^{m} w_{q, i}})}^{β} . d (x_{i, j}, z_{l, j}) for for 1 \leq q \leq k \\ u_{l, i} = 0 for q \neq l \end{matrix}$ (2)

This is a K-means based function which consider the distance to the centroids based on the weighting coefficients. Here, β is a user-defined parameter that has non-negative values. A large value of β makes the objective function to be more sensitive to the changes in weight values. Its large values cause variables with high weights to play a strong role in distinguishing between related and unrelated variables (dimensions). For the case β=0, it is as like as k-means algorithm.

–After determining the matrix U, equation (3) has updated the Z matrix.

$\begin{matrix} Z_{l, j} = (\sum_{i = 1}^{n} u_{l, i} . x_{i, j}) / (\sum_{i = 1}^{n} u_{l, i}) \\ for 1 \leq l \leq k and 1 \leq j \leq m \end{matrix}$ (3)

–Holding the Z and the U matrixes, the W matrix has been updated using equation (4). Here h is a parameter that controls the influence of X on w_l,i.

$w_{l, j} = \frac{\exp (- X_{l, j} / h)}{\sum_{t = 1}^{m} \exp (- X_{l, t} / h)}$ (4) –Calculating the fitness of each solution (an egg in a nest) using Eqation 5. $\begin{matrix} F (W) = \sum_{l = 1}^{k} \sum_{i = 1}^{n} \sum_{j = 1}^{m} u_{l, i} . {(\frac{w_{l, j}}{\sum_{j = 1}^{m} w_{l, j}})}^{β} . d \\ (x_{i, j}, z_{l, j}) \end{matrix}$ (5) $s . t {\begin{matrix} 0 \leq w_{l, j} \leq 1, \\ \sum_{l = 1}^{k} u_{i, l = 1} ui, l \in {0, 1}, 1 \leq i \leq n . \end{matrix}}$

Here we have used a normalized representation of variable weights in the objective function, and its optimized value is investigated using the Cuckoo Search algorithm. The optimization algorithm looks for the appropriate weight for each feature to minimize the sum of all intra-cluster distances.

% ρ of Cuckoo eggs in each nest with the lowest fitness value will be destroyed and replaced with the new solution.

Three nests (this is arbitrary and deduced by the try and error) with the lowest average fitness value will migrate to the three best nests, randomly.

Until: (reaching the maximum iteration or when 95% of the population of cuckoos converges to a specific habitat)

3.2 Deep Inferrer architecture for classification

In the proposed classification architecture that is depicted in Fig. 4, we employ convolution and pooling layers, recursively. In the pooling layer, a feature vector is obtained by considering a maximum operation. In the following, the Bi-LSTM layers are stacked to learn the time series behavior of the data. Inspired by [20], we used soft-max classifier for classifying the output of the Bi-LSTM.

Fig. 4

The presented classifier block diagram.

Fig. 5

The presented classifier block diagram.

3.3 –FuzzySoft-max classifier

As a variation of the Soft-max logistic function, the Soft-max classifier reports back the “confidence score” for each class. In its simplest mode, the final class is the item with the highest confidence score in the list. By the m different blocks, the Bi-LSTM layer has the output form shown as Equation 8. In the last phase, the Soft-max classifier in the Dense layer transforms the raw outputs of the Bi-LSTM layer to the class probabilities. The classification of customers is executed by selecting the one with the highest score [20, 25].

$HV = (h_{1} . h_{2} . \dots . h_{m})$ (6) The Soft-max Classifier(SC) determines the final state using Equation 7.

$\hat{y} = SC (HV) = \underset{c \in C}{argmax} (\frac{\exp (T_{C_{j}} (HV))}{\sum_{j = 1}^{m} \exp (T_{c_{j}} (HV))})$ (7)

The T_{C
_j} (HV) stands for the probability of selecting class C_j as the final calss [20, 25].

An Intuitive Fuzzy Set (IFS) A in the domain of X is defined as an object as follows where μ_A: X ⟶ [0, 1] defines membership degree and ϑ_A: X ⟶[0,1] refers to non-membership degree of an object.

$A = {(x, μ_{A} (x), ϑ_{A} (x)) | x \in X}$ (8)

For every x ∈ X, 0≤μ_A (x) + ϑ_A (x)≤1. Authors in [26] stated that IFS Intuitionistic Fuzzy Number (IFN) could the membership degree and non-membership degree of IFN and the related rankimg method. The authors in [28] propose a method that uses score and accuracy functions to calculate IFN. In this paper, we use a variety of score functions introduced by [26]. Supposing the Intuitionistic Fuzzy α_i = μ_i, ϑ_i, then the score function is presented as Equation 9.

$S (α_{i}) = \frac{1 + μ_{i} - ϑ_{i}}{2}$ (9)

Considering the membership and non-membership value of the Bi-LSTM h state to the corresponding classes the T_{C
_j} (HV) would be as below. T_{C
_j} (HV) corresponds to the probability value of selecting class C_j [20].

$T_{c_{j}} (HV) = \sum_{i = 1}^{n} (s_{c_{j}} (h_{i}) f (h_{i} \times v_{c_{j}} (h_{i})) + b_{j}$ (10)

In this equation, the v_{c
_j} (h_i) stands for the weight of the arc connected the h_i input node to the C_j output class. f (x) is an activation function. The fuzzy significance of the input node (h_i) associated to the output class (C_j) symbolized with s_{C
_j} (h_i). This is expressed by Equation 11. In this equation, μ_{C
_j} (h_i) and ϑ_{C
_j} (h_i) corresponds to the fuzzy and non-membership membership of the input node h_i associated to the output class C_j, respectively.

$s_{C_{j}} (h_{i}) = (\frac{1 + μ_{C_{j}} (h_{i}) - ϑ_{C_{j}} (h_{i})}{2})$ (11)

Considering V as a weight vector that connects each of the Bi-LSTM layer nodes to the associated output class, the values of the parameters μ_{C
_j} (h_i) . ϑ_{C
_j} (h_i) are determined. Where, v_nj points to the weight of the arc connecting n’th input node in the LSTM layer to the output class (C_j).

$V = (v_{11} . v_{12} \dots . v_{16} \dots \dots . v_{mj})$ (12)

$μ_{C_{j}} (h_{i}) = (v_{ij})$ (13)

$ϑ_{C_{j}} (h_{i}) = \frac{\sum_{o = 1 . o \neq j}^{m} μ_{c_{o}} (h_{i})}{(m - 1)}$ (14)

4 Implementation

In this section, we conducted multiple experimental evaluations to determine the validity of the proposed method on two famous commercial datasets. The investigated datasets are Insurance Company Benchmark dataset [27] and Instacart Dataset [28]. The proposed Fuzzy Soft-max classifier is implemented using the Keras library [29]. In the classification task, the data are segmented into mini-batches of 100 through training and testing phases. The learning rate sets to 10e^-3 with a decay rate of 0.95. For the regularization purpose, we used a drop-out strategy with probability p = 0.3 [21]. The main parameters of the proposed deep neural network have been adjusted by defining a validation subset.

Table 2
The parametes of the CSVW

Parameter Values

population Initial 10

Minimum number of eggs 5

Maximum number of eggs 10

Maximum number of iterations 100

Nmax 80

ELR 5

β 8

Population variance for process termination 10-15

Parameter	Values
population Initial	10
Minimum number of eggs	5
Maximum number of eggs	10
Maximum number of iterations	100
Nmax	80
ELR	5
β	8
Population variance for process termination	10-15

Also for the variable weight clustering algorithm, we set the parameters as below. The β parameters were adjusted using experiments as 8.

As our proposed method consists of both clustering and classification tasks, here we introduce the investigated criteria for evaluating the proposed method.

4.1 Clustering criteria

4.1.1 Silhouette

This criterion is a tool for identifying the quality of clustering and selecting members in a cluster. This value has a value between 1 and -1. The closer the number is to one, the better the clustering and negative values mean the selected cluster is not good and the data may belong to neighboring clusters. Euclidean or Manhattan distance can be used for the distance in this criterion. In general, if the overall value of this criterion is high in a cluster, it means that the cluster in question has clustered the data better. Here, this criterion is used to select the number of clusters. That is, the higher the value of this metric in any number of clusters, that is, clustering with that number of clusters is better for the data. Finally, to better express this criterion, it states whether there is a large gap between each sample and all other samples in the same cluster or among different clusters. Silhouette Visualizer is a visualization tool for the Silhouette benchmark that displays data graphically in its clusters. Based on this, the value between 1 and -1 shows the data graphically and the appropriate cluster can be selected from the events.

4.1.2 Elbow

This metric is one of the methods used to select the number of clusters in unsupervised clustering data. It is clear that if we choose too few clusters, data from the data set will be lost and will not be placed in the right cluster, on the other hand, if the number of clusters is chosen too small, small differences will become important and we will not have a proper clustering, this metric using the location The intersection of two graphs defines the clustering proposal.

4.2 Classification criteria

In order to evaluate the performance of machine learning algorithms for classification problems, different criteria can be used. The following criteria are used in this article:

$Accuracy (%) = 100 * \frac{TP + TN}{TN + FN + TP + FP}$ (15)

$Precision (%) = 100 * \frac{TP}{TP + FP}$ (16)

$Recall (%) = 100 * \frac{TP}{TP + FN}$ (17)

$F 1 - Score (%) = \frac{2 * Recal * precision}{Recal + precision}$ (18)

4.3 Dataset

4.3.1 The Insurance Company Benchmark dataset

This dataset proposed in the CoIL 2000 challenge contains customer information of an insurance company [27]. This dataset includes 86 variables related to customers (including demographic variables (1-43) and product ownership variables (44-86)). The data is for predicting the purchase of caravan insurance policies by users and includes 5822 records. All customers living in areas with the same zip code are assumed to have the same demographic characteristics. Variable 86 (Purchase) indicates whether the customer has purchased a caravan insurance policy or not.

The dataset includes three species:

tic_2000_train_data: dataset for training a validating prediction models and creating explanations (5822 customer records).

tic_2000_eval_data: dataset for predictions (4000 customer records). Its format is similar to the training data, only it does not have a target variable. Participants are supposed to return only the list of predicted targets.

tic_2000_target_data: is the “target” variable for the evaluation dataset.

4.3.2 Instacart dataset

It shows the order information of more than 200 thousand users (206,209) over time. This data includes 3 million orders of these users from the Instacart store, ultimately aiming to predict the user’s next order and his next choice. In this data, the order day and week and the time between each user’s orders are specified. This dataset contains 7 CSV files as follows [28]:

Aisles.csv (aisle_id, aisle): details of aisles

Departments.csv (department_id, department): details of department

Products.csv (product_id,product_name,aisle_id,department_id): details of a product

Orders.csv (order_id, user_id, eval_set(prior/train/test), order_number, order_dow, order_hour_of_day, days_since_prior_order)

It consists of order details placed by any user. Prior data: The history of users’ orders. Train data: Current order data of every user. Test data: Future order data for each of the users.

Order_product_prior.csv (order_id,product_id,add_to_cart_order, reordered): consists of all product details for a train

order add_to_cart_order: denotes the sequence in which products were added to the cart. reordered: product is reordered? (1/0)

Order-product_train.csv (order_id, product_id, add_to_cart_order, reordered): consists of all product details for any prior order

sample_submission.csv (order_id,products) Fig. 6 illustrates the 30 most populare Aisles names that are ordered by customers.

Fig. 6

Visualization of the Silhouette score values for different Cluster numbers of InstaCart dataset from k = 2 to 7.

Fig. 7

Visualization of the Elbow distortion values for different Cluster numbers of InstaCart dataset from k = 2 to 9.

4.4 Performance evaluation on InstaCart Dataset

To determine the best cluster number, we examined cluster numbers from k = 2 to 9. The results are illustrated in Figs. 8 and 9, where the Silhouette and Elbow criteria were plotted. As can be seen, the best cluster number matched on k = 4.

Fig. 8

The tops 10 most ordered products Aliases by the customers from each cluster.

Fig. 9

Visualization of the Silhouette score values for different Cluster numbers of Insurance Company dataset from k = 2 to 7.

By determining the appropriate cluster number, we extracted the top 10 most ordered products Aliases by the customers from each cluster. Those are illustrated in Fig. 9.

A first analysis of the clusters it is confirmed that:

fresh fruits

fresh vegetables

packaged vegetable fruits

yogurt

packaged cheese

milk water seltzer sparkling water

chips pretzels are products which are generally bought by the majority of the customers.

4.4 Performance evaluation on Insurance Company dataset

As stated, the insurance company is a supervised dataset that includes information about the customers who decide to purchase Caravan insurance or not. Then, we can further and better analyze the proposed clustering method. Figures 10 and 11 confirm that the segmentation of data into two clusters is the best case for this dataset.

Fig. 10

Visualization of the Elbow distortion values for different Cluster numbers of Insurance Company dataset from k = 2 to 9.

Fig. 11

Accuracy and loss plot for the Insurance Company dataset, through running the epochs a) using Soft-max classifier b) using proposed classifier.

Fig. 12 (a) and (b) show the training accuracy and loss rate using the ordinal soft-max and proposed classifier on the Insurance Company dataset by running the epochs. As it is illustrated, with the passage of epochs, the accuracy of training and testing data increases in general, and the amount of loss rate decreases. As depicted in Fig. 6(b), the proposed classifier has reached a better accuracy and loss rate compared to the ordinal Soft-max classifier. This is due to the fuzzy inference capability that our proposed classifier is equipped with. Also in the proposed Fuzzy Soft-max classifier, the validation accuracy and loss have no significant difference with trian accuracy and loss, which indicates the superiority of the proposed classifier. Table 3 reported the classification criteria that are generated on test data. The results indicated that the performance criteria generated using the proposed classifier increased by about 4% compared to the Soft-max classifier.

Fig. 12

The Comparison of Silhouette score (a) for instacart and (b) insurance company datasets.

Table 3

Classification criteria for the Insurance Company dataset

	Precision		Recall		f1-score		Accuracy
Classifier	Soft-max	Fuzzy Soft-max	Soft-max	Fuzzy Soft-max	Soft-max	Fuzzy Soft-max	Soft-max	Fuzzy Soft-max
1	0.86	0.96	0.88	0.96	0.87	0.95	0.91	0.965
0	0.96	0.99	0.96	0.99	0.96	0.98
micro avg	0.94	0.98	0.94	0.98	0.94	0.97
macro avg	0.91	0.97	0.92	0.97	0.91	0.96

4.5 Comparison with literature

We compare the proposed clustering algorithm with some co-related articles in the literature for better clarification. As the investigated researches differ in terms of dataset and evaluation criteria, we simulated them with the information obtained from the method presented in the article. The results are illustrated in Fig. 13(a) and (b), both for instacart and insurance company datasets in terms of Silhouette score. As depicted, the proposed method reaches a superior Silhouette score in comparison.

To ensure the signification and robustness of the proposed model and inspired by [29] we perform both the Wilcoxon signed-rank and Friedman test. the Wilcoxon signed-rank test can be used to make simple pairwise comparisons, and the Friedman test is effective for making multiple comparisons, The Wilcoxon signed-rank test assumes that there is no significant difference in the Silhouette score between the two models being compared, while the Friedman test starts with the hypothesis that there is no significant variation in the Silhouette score of the models being compared. As an important parameter, the significance level indicates the probability of obtaining the observed test statistic. We set this parameter to ∞=0.005 for both methods,. The original hypothesis is rejected if the critical value W is below 100, and the p-value is less than 0.05. Table 4 compares the two statistical methods results for the investigated methods in comparison. As illustrated, for the instacart dataset, the least significant difference is between the proposed method and the method presented in [7] while for the this happens for the insurance company dataset this happens in comaprison with [9].

Table 4
Results of Wilcoxon signed-rank test and Friedman test for instacart and insurance company datasets

Datasets Compared model Wilcoxon signed-rank test Friedman test

W=100 P-value P-value

Instacart Proposed vs. Kmeans 2 0.00002 H₀: e1 = e2 = e3 = e4 F = 205.104 p = *0.0000** (Reject H0)*

Proposed vs. Fuzzy Kmeans 88 0.02005

Proposed vs. Kmeans GWO 75 0.00511

Proposed vs. Kmeans PSO 99 0.40011

Insurance Proposed vs. Kmeans 24 0.00031 H₀: e1 = e2 = e3 = e4 F = 231.416, p = *0.0000** (Reject H0)*

Company Proposed vs. Fuzzy Kmeans 99 0.04121

Proposed vs. Kmeans GWO 90 0.02973

Proposed vs. Kmeans PSO 94 0.03321

Datasets	Compared model	Wilcoxon signed-rank test	Friedman test
Instacart	Proposed vs. Kmeans	2	0.00002	H₀: e1 = e2 = e3 = e4 F = 205.104 p = 0.0000* (Reject H0)*
	Proposed vs. Fuzzy Kmeans	88	0.02005
	Proposed vs. Kmeans GWO	75	0.00511
	Proposed vs. Kmeans PSO	99	0.40011
Insurance	Proposed vs. Kmeans	24	0.00031	H₀: e1 = e2 = e3 = e4 F = 231.416, p = 0.0000* (Reject H0)*
Company	Proposed vs. Fuzzy Kmeans	99	0.04121
	Proposed vs. Kmeans GWO	90	0.02973
	Proposed vs. Kmeans PSO	94	0.03321

5 Conclusion

In this paper, we proposed classification and clustering algorithms for analyzing customer-related datasets. As the customer related datasets usually contain high domensional data we present a Cuckoo-Search-based Variable Weighting (CSVW) Clustering algorithm to obtain optimal variable weights of such dimensions. As we explained in detail in section 3, the proposed clustering algorithm does not dependent on the initial centers of clusters, and it takes advantage of the cuckoo algorithm to avoid falling into the trap of local optimum. This paper also proposed a deep Inferarer Classifier for categorizing customers using Bi-Directional Long Short-Term Memory (Bi-LSTM) neural network, which uses a Fuzzy Inferential Classifier on its last layer. The Insurance Company (TIC) and InstaCart datasets are utilized for the experiments and performance evaluation. Simulation results reveal that the proposed clustering algorithm generates appropriate Silhouette and Elbow criteria scores in a few cycles of execution in comparison to similar clustering algorithms in literature. Also, the proposed classification algorithm with fuzzy soft-max classifier hits the better Classification Criteria in comparison. In the future we aim to propose an ensamble clustering algorithm that uses variety of optimization algorithms for clustering other type of high-dimensional data, such as text-based data

Data availability statement

The data that support the findings of this study are

openly available in [Kaggle] at [https://www.kaggle.com/kushshah95/the-insurance-company-tic-benchmark], reference number [27].

openly available in [Kaggle] at [https://www.kaggle.com/c/instacart-market-basket-analysis], reference number [28].

Conflict of interest

There is no conflict of interest.

References

Griva

, Bardaki

, Pramatari

, Papakiriakopoulos

Retail business analytics: Customer visit segmentation using market basket data, Expert Systems with Applications 100 (2018), 1–6.

Budhi

G.S.

, Chiong

, Wang

Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features, Multimedia Tools and Applications 80(9) (2021), 13079–97.

Bahari

T.F.

, Elayidom

M.S.

An efficient CRM-data mining framework for the prediction of customer behaviour, , Procedia Computer Science 46 (2015), 725–31.

Astakhova

N.N.

, Demidova

L.A.

, Nikulchev

E.V.

Forecasting of time series’ groups with application of fuzzy c-mean algorithm, Contemporary Engineering Sciences 8(35) (2015), 1659–1677.

Brito

P.Q.

, Soares

, Almeida

, Monte

, Byvoet

Customer segmentation in a large database of an online customized fashion business, , Robotics and Computer-Integrated Manufacturing 36 (2015), 93–100.

Pakhira

M.K.

A fast k-means algorithm using cluster shifting to produce compact and separate clusters, Int J Eng 28(1) (2015), 35–43.

, Wang

, Li

, Zhou

Particle swarm optimizer for variable weighting in clustering high-dimensional data, Machine Learning 82(1) (2011), 43–70.

Cateni

, Colla

, Vignali

, Vannucci

Data Pre-processing for Efficient Design of Machine Learning-Based Models to be Applied in the Steel Sector. In Cybersecurity workshop by European Steel Technology Platform (2021) (pp. 13–27). Springer, Cham.

Khan

, Luo

, Huang

J.Z.

, Shahzad

Variable weighting in fuzzy k-means clustering to determine the number of clusters, IEEE Transactions on Knowledge and Data Engineering 32(9) (2019), 1838–53.

10.

Liu

, Wang

, Chao

A feature weighted fuzzy clustering algorithm based on multistrategy grey wolf optimization, Journal of Electrical and Computer Engineering (2021), 2021.

11.

Abtahi

, Sahraeian

, Rahmani

A stochastic model for prioritized outpatient scheduling in a radiology center, International Journal of Engineering Transactions A: Basics 33(4) (2020).

12.

, Chen

, Shi

Y.T.

, Bose

, Cai

User segmentation for retention management in online social games, , Decision Support Systems 101 (2017), 51–68.

13.

Holý

, Sokol

, Cerný

, Clustering retailproducts based on customer behaviour, Applied Soft Computing 60 (2017), 752–62.

14.

Yoo

, Jang

A bibliographic survey of business models, service relationships, and technology in electronic commerce, , Electronic Commerce Research and Applications 33 (2019), 100818.

15.

Kaur

, Pal

S.K.

, Singh

A.P.

Hybridization of Chaos and Flower Pollination Algorithm over K-Means for data clustering, , Applied Soft Computing 97 (2020), 105523.

16.

Khalili-Damghani

, Abdi

, Abolmakarem

Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-centric industries, , Applied Soft Computing 73 (2018), 816–28.

17.

Arunachalam

, Kumar

Benefit-based consumer segmentation and performance evaluation of clustering approaches: An evidence of data-driven decision-making, , Expert Systems with Applications 111 (2018), 11–34.

18.

Qadadeh

, Abdallah

Customers segmentation in the insurance company (TIC) dataset, , Procedia Computer Science 144 (2018), 277–90.

19.

Onan

Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification, Journal of King Saud University-Computer and Information Sciences 34(5) (2022), 2098–117.

20.

Bourjandi

, Yadollahzadeh-Tabari

, GolsorkhtabariAmiri

Predicting user’s movement path in indoor environments using the stacked deep learning method and the fuzzy soft-max classifier, IET Signal Processing (2022).

21.

Yang

X.S.

, Deb

Cuckoo search: recent advances and applications, Neural Computing and Applications 24(1) (2014), 169–174.

22.

Yang

X.S.

, Deb

Engineering optimisation by cuckoo search. arXiv preprint arXiv:1005.2908 (2010).

23.

Rajabioun

Cuckoo optimization algorithm,, Applied Soft Computing 11(8) (2011), 5508–5518.

24.

Kahramanli

A modified cuckoo optimization algorithm for engineering optimization,, International Journal of Future Computer and Communication 1(2) (2012), 199.

25.

Softmax function based intuitionistic fuzzy multi-criteria decision making and applications,, Operational Research 16(2) (2016), 327–48.

26.

Verma

, Sharma

B.D.

New operations over hesitant fuzzy sets,, Fuzzy Information and Engineering 5(2) (2013), 129–46.

27.

Van Der Putten

, van Someren

CoIL challenge 2000: The insurance company case (pp. 2000–09).Technical Report 2000–09, Leiden Institute of Advanced Computer Science, Universiteit van Leiden (2000). Available from:http://www.liacs.nl/putten/library/cc2000.

28.

Wan

, Wang

, Liu

, Bennett

, McAuley

Representing and recommending shopping baskets with complementarity, compatibility and loyalty. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018 Oct 17 (pp. 1133–1142).

29.

M.W.

, Xu

D.Y.

, Geng

, Hong

W.C.

A hybrid approach for forecasting ship motion using CNN–GRU–AM and GCWOA, , Applied Soft Computing 114 (2022), 108084.

30.

Mohanty

P.P.

, Nayak

S.K.

A modified cuckoo search algorithm for data clustering,, International Journal of Applied Metaheuristic Computing (IJAMC) 13(1) (2022), 1–32.

31.

Long

N.T.

, Hai

H.T.

, Van Hai

, Van Tam

A possibilistic Fuzzy c-means algorithm based on improved Cuckoo search for data clustering, , Journal of Military Science and Technology (CSCE6) (2022), 3–15.

Customer analysis using a deep inferarer classifier and a variable-sensitive clustering algorithm optimized by the Cuckoo search method

Abstract

Keywords

1 Introduction

2 Literature review

2 Background

2.1 Convolutional neural network Layers

3 Proposed method

3.1 Cuckoo-Search-based variable weighting Clustering algorithm

3.1.1 Cuckoo Search for variable weighting (CSVW)

Table 2 The parametes of the CSVW Parameter Values population Initial 10 Minimum number of eggs 5 Maximum number of eggs 10 Maximum number of iterations 100 Nmax 80 ELR 5 β 8 Population variance for process termination 10-15

4.1.1 Silhouette

4.1.2 Elbow

4.2 Classification criteria

4.3.1 The Insurance Company Benchmark dataset

4.3.2 Instacart dataset

Data availability statement

Conflict of interest

References

Table 2
The parametes of the CSVW

Parameter Values

population Initial 10

Minimum number of eggs 5

Maximum number of eggs 10

Maximum number of iterations 100

Nmax 80

ELR 5

β 8

Population variance for process termination 10-15