Sage Journals: Discover world-class research

Abstract

Feature selection is a complicated multi-objective optimization problem with aims at reaching to the best subset of features while remaining a high accuracy in the field of machine learning, which is considered to be a difficult task. In this paper, we design a fitness function to jointly optimize the classification accuracy and the selected features in the linear weighting manner. Then, we propose two hybrid meta-heuristic methods which are the hybrid basic bald eagle search-particle swarm optimization (HBBP) and hybrid chaos-based bald eagle search-particle swarm optimization (HCBP) that alleviate the drawbacks of bald eagle search (BES) by utilizing the advantages of particle swarm optimization (PSO) to efficiently optimize the designed fitness function. Specifically, HBBP is proposed to overcome the disadvantages of the originals (i.e., BES and PSO) and HCBP is proposed to further improve the performance of HBBP. Moreover, a binary optimization is utilized to effectively transfer the solution space from continuous to binary. To evaluate the effectiveness, 17 well-known data sets from the UCI repository are employed as well as a set of well-established algorithms from the literature are adopted to jointly confirm the effectiveness of the proposed methods in terms of fitness value, classification accuracy, computational time and selected features. The results support the superiority of the proposed hybrid methods against the basic optimizers and the comparative algorithms on the most tested data sets.

Keywords

Feature selection hybrid meta-heuristic bald eagle search particle swarm optimization classification accuracy

1. Introduction

Thousands of applications have boosted the developments of machine learning, however, extracting the useful knowledge from the collected data is more and more complicated than before [1, 2]. Data mining is the process of finding meaningful information and models from large-scale data, but the collected data may contain uninformative or even misleading features in practice [3]. Feature selection is one of the major pre-processing means of eliminating the redundant and irrelevant data from the whole data set, which can reduce the dimensionality of the data sets, accelerate the learning process and improve the classification accuracy [4].

Feature selection aims at choosing the optimal subsets from the whole data sets but without losing any classification accuracy or even improving it compared to the originals [5]. Therefore, feature selection techniques are widely used in reducing the number of selected features in high-dimensional data sets [6].

Feature selection methods can be broadly categorized into three classes correspond to the way that they are integrated with classification model: wrappers, filters, and embedded methods [7]. The wrapper methods utilize learning algorithm as the classifier into the classification process to determine whether a certain feature should be chosen or not, by which to gradually generate a robust subset. Some well-known classifiers are widely used in the state-of-the-art works, such as machine learning classifier, Las Vegas wrapper and natural network-based method [8]. On the other hand, filter methods categorize the features into certain criteria and only the features with high scores will be chosen. There are some popular filter methods proposed in the literature, for instance, Information Gain (IG), Trace Ratio Criterion (TRC) and ReliefF [9]. In addition, the embedded methods are proposed to solve the existed problems in filter-based methods cause they exploit the interaction with the classifier and the correlation between the feature and label is further considered [10].

As for the process that how to select the optimal subset from the initial data set is considered to be the essential aspect of the feature selection, namely, “subset generation” [11]. Generally speaking, there are three main search methods that commonly used in subset generation, which are complete search, random search and heuristic search, respectively. The complete search generates all possible feature subsets from the original data set, evaluates each of those feature subsets, and selects the best one. Thus, if the number of the features is $N$ , then the generated number of the subsets is $2^{N}$ , which may consume much computational time and it is totally unacceptable if the data sets are high-dimensional in practice. Another way that may reduce the computation of the algorithm is to utilize the random search instead. In this method, a number of feature subsets are randomly generated and evaluated, and the best one is selected. This method does not use any heuristic information to guide the search, since the generation of each feature subset does not exploit information about the quality of previously generated feature subsets, and hence it is an inefficient search method in general.

An alternative method that may overcome the drawbacks of the two previous strategies is to utilize the heuristic search. As described in [12], a heuristic search can be clarified as a depth-first search and provides heuristic information to orient the search into the direction of the search goal, which tends to be significantly more cost-effective than the complete and random searches in the field of feature selection optimization. Meta-heuristic search is a special type of heuristic search, which can be defined as "a upper general template that can be applied as guiding strategies when projecting underlying heuristics to deal with specific optimization problems" in [13].

Based on the success of previous meta-heuristic algorithms in solving various optimization problems, more and more innovative nature-inspired algorithms are being proposed in the recent years. For example, BES was firstly proposed by Alsattar et al. [14] in 2020 to mainly simulate the hunting strategy and intelligent social behavior of bald eagles when they were looking for preys. Additionally, BES has the advantage of solving various complex numerical optimization problems effectively and may be applied in extensive prospects optimistically in the future because of its strong global search ability [15]. However, BES was originated proposed for continuous optimization, which indicates that BES cannot be directly utilized in feature selection optimization since the solution to the problem is discrete. Moreover, the convergence accuracy of BES is not so satisfactory in solving some certain problems [16]. As a new type of efficient nature-inspired meta-heuristic method, BES has not been applied for the domain of feature selection yet. Therefore, modifying BES into the feature selection field and improving the performance of BES is a considerable research in this work.

On the other hand, PSO is deemed as one of the most well-known meta-heuristic algorithms in the state-of-the-art works [17]. PSO was firstly proposed by Kennedy and Eberhart in 1995 [18], which simulated the regularity of birds’ foraging behaviors in nature. Generally speaking, PSO is effective in solving various optimization problems due to its unique characteristics like simplicity, applicability and low complexity [19]. PSO has been successfully applied to manage optimization problems in various fields, such as edge detection, path planning, clustering analysis, power system operations, image segmentation and so on. Nevertheless, PSO may have certain drawbacks of premature convergence and tight relation between the values and the variants [20].

In the recent years, a huge number of hybrid meta-heuristic algorithms are proposed to solve combinative optimization problems. A clever combination of the meta-heuristic algorithms can unite the advantages of the each optimizer and eliminate the disadvantages of those hybrid strategies [21]. More specifically, the hybridization of local search process and the evolutionary computation can directly refine the achieved solutions during searching. The evolutionary computation is capable to recognize the promising solution space that may exist more solutions with high quality, thus the performance of the hybrid algorithms will be improved to some extent [22]. In summary, the hybrid strategy that combines the each strength of the excellent meta-heuristic algorithms is a candidate effective direction to solve the designed feature selection problem in this paper.

To the best of our knowledge, this work is the first attempt to hybridize BES and PSO to solve the proposed feature selection problems. Totally, the key contributions of this paper are listed as follows:

(1)
The most important purpose of this paper is to improve the classification accuracy while reducing the selected features. Based on that, we design a linear weighting fitness function to jointly tackle the above contradictory objectives.
(2)
We aim at solving the designed fitness function by utilizing the advantages of BES and alleviate its shortcomings by hybridizing it with PSO. Firstly, the main procedures of the original algorithms are displayed briefly. Next, we plan to embed PSO into the searching phase and swooping phase to enhance the exploitation capability of BES. Further, a new chaotic map is adopted to broaden the solution space and maintain the diversity of the population obtained by the hybrid method. Finally, the binarization strategy is designed and optimized in the hybrid method since the solution to the feature selection problem is binary.
(3)
In order to confirm the effectiveness of the proposed hybrid methods, the experiments are conducted on 17 classical UC Irvine (UCI) Machine Learning Repository data sets as well as the comparison are carried out with five well-established algorithms from the literature in terms of the various indicators, such as fitness value, classification accuracy, CPU running time and the selected feature numbers.

The rest of this paper is listed as follows. Section 2 introduces the research overview in the field of feature selection optimization. Section 3 specifies the preliminaries of this paper, including giving a formulated fitness function and explaining the original methods. The details of the hybrid meta-heuristic algorithm are exposed in Section 4. Section 5 presents the experimental results and provide the relevant analysis of the obtained results. Finally, Section 6 concludes this paper and suggests the future work. The Tables A-1 and A-2 of Appendix A describes all the symbols and abbreviations used in this paper.
2. Related work

As early as 1998, Kohavi et al. [23] have proved that the wrapper-based feature selection method is good at improving the performance of the learning process and eliminating unnecessary information in the whole data set. From the perspectives of actual applications, the wrapper-based selection method usually achieves better classification accuracy than the filter-based method [24], therefore many wrapper-based selection methods can be found in the literature. However, wrapper methods are usually much slower than filter methods, so filter methods are more scalable to data sets with a very large number of features.

As a powerful swarm intelligence-based method, PSO has become a research highlight in optimizing the selection of the final feature subset. Sharkawy et al. [25] used PSO technique in determining the particle type and dimensions in transformer oil. Sakri et al. [26] embedded PSO algorithm into various renowned classifiers to increase the accuracy level of predicting women breast cancer recurrence. Another research paper in the optimization field of medicine was presented by Inbarani et al. in [27]. The authors proposed new supervised feature selection methods based on hybridization of PSO and other PSO-based methods for the diseases diagnosis. In [28], Chuang et al. introduced Logistic map and Tent map into binary PSO model to determine the inertia weight of PSO and implement the feature selection. More details of PSO applied in feature selection can be found in [29].

Besides PSO, many classical meta-heuristic algorithms have been modified to be applicable for feature selection optimizations since they showed competitive results in selecting the informative features compared to other search methods. The examples of the abovementioned meta-heuristic algorithms are binary gray wolf optimization (BGWO) [30], binary dragonfly algorithm (BDA) [31], unsupervised ant colony optimization (UACO) [32], binary bat algorithm (BBA) [33], binary firefly algorithm (BFA) [34], and so on.

Designing new optimizers to improve the performance of the meta-heuristic algorithms are new research trend in feature selection optimization. Mafarja et al. [35] introduced two binary variants of the whale optimization algorithm (WOA) algorithm, which are tournament and roulette wheel selection mechanisms instead of using a random operator in the searching process, as well as enhancing the exploitation of the WOA by using crossover and mutation operators in the algorithm. Peng et al. [36] introduced ant colony optimization (ACO) algorithm into feature selection to effectively eliminate redundant features and prevent feature selection from falling into a local optimum. In [37], the authors presented several feature selection algorithm models based on the granular information, which are improved binary genetic algorithm with feature granulation (IBGAFG), the improved neighborhood rough set with sample granulation (INRSG) and granularity $\lambda$ optimization based on genetic algorithm (ROGA), respectively. Ghosh et al. [38] introduced a binary variant of sailfish optimizer, named as binary sailfish (BSF) optimizer and utilized a sigmoid function into BSF to solve feature selection problems. Li et al. [39] proposed an improved binary dragonfly algorithm (IBDA) which adopted evolutionary population dynamics (EPD) strategy and crossover operator in the algorithm itself to enhance the exploitation ability while ensuring population diversity during searching. In [40], the researchers employed chaotic maps into crow search algorithm (CSA) to overcome the shortcomings of CSA, such as low convergence rate and entrapment in local optima in solving feature selection problems.

In addition, BES was proposed to mainly solve the complex numerical optimization problems. For examples, Bharanidharan et al. [41] proposed an improved BES to increase the accuracy of the machine learning techniques in diagnosing dementia and the results demonstrated the accuracy of Support Vector Machine could be increased from 64% to 88% when BES was used for transforming the features. Sayed et al. [42] used BES to find the optimal values of the hyperparameters of a SqueezeNet architecture in their proposed melanoma skin cancer prediction model. In addition, the utilization of BES in the proposed model could effectively increase the robustness and efficiency of the proposed model compared to those state-of-the-art methods. Zhang et al. [43] proposed polar-coordinate BES (PBES) for curve approximation problem and they strengthened the exploration and exploitation capabilities of BES by distributing the initialized individuals more uniform and introducing some parameters inside the algorithm. The experimental results also confirmed the superior of PBES to some well-known meta-heuristic algorithms. However, BES has not been utilized in the area of feature selection since it was proposed for solving the problems with continuous solution space.

Over the past decades, hybridizing various meta-heuristic optimizers to utilize the advantage of each one has attracted more and more attention in feature selection optimization. Reportedly, the hybrid algorithms are capable to achieve better performance in dealing with specific problems, especially if they are related to real-life [44]. The first hybrid meta-heuristic algorithm proposed for feature selection problems can be trace back to 2004 in [45], which combined genetic algorithm (GA) and the local search operators to fine-tune the search process. In [46], Nemati et al. proposed a novel feature selection algorithm that made use of advantages of both ACO and GA for maximizing predictive accuracy, and finding the smallest subset of features. Zheng et al. [47] proposed a hybrid filter-wrapper feature subset selection algorithm called the maximum spearman minimum covariance cuckoo search (MSMCCS), which aimed at combining the efficiency of filters with the greater accuracy of wrappers. Mafarja et al. [48] developed a wrapper-based feature selection method that combined GWO and WOA to alleviate the drawbacks of both algorithms and may experimentally outperform other state-of-the-art approaches in their research. Moreover, Qasim et al. [49] presented a hybrid algorithm between BDA and the statistical dependence, which guided BDA and used $K$ -nearest neighbor ( $K$ NN) classifier on the data set to verify it in the chosen fitness function.

Due to the nature of simplicity and high performance, PSO are regularly adopted to be hybridized with other meta-heuristic algorithms in feature selection optimization. For instance, Xue et al. [50] developed PSO and GA to decrease data size and increase the precision and effectiveness of specific data sets. In [51], Ke et al. proposed a hybrid PSO with a spiral-shaped mechanism (HPSO-SSM) for selecting the optimal feature subset for classification via a wrapper-based approach. Adamu et al. [52] presented a novel hybrid binary version of enhanced chaotic CSA and PSO algorithm (ECCSPSOA) to solve feature selection problems, in which PSO was used to converge into the best global solution in the search field. Tawhid et al. [53] combined the bat algorithm with its capacity for echolocation helping explore the feature space and enhanced version of PSO with its ability to converge to the best global solution in the search space. Besides, Tashi et al. [54] proposed a binary version of the hybrid GWO and PSO (i.e., BGWOPSO) to find the best feature subset and used it as a wrapper feature selection method in their research.

According to no free lunch (NFL) theorem [55], there is no one common algorithm can solve all optimization problems optimally. Despite the abovemetioned works may solve the feature selection problem well in the literature, they are perhaps restricted in the specific data sets or certain scenarios. In other words, the performance of the proposed algorithms may somehow degrade in solving other types of data sets or optimized problems [56]. Therefore, making the efforts to improve the effectiveness of the exiting meta-heuristic algorithm is worthwhile. Due to the high exploration capability of BES, we plan to apply it into feature selection and intend to hybridize it with other optimizers to further discover its potential and solve feature selection problems more efficiently in this work.

3. Preliminaries

3.1 Fitness function

Feature selection is known as a multi-objective optimization by nature. A common prior method to solve the multi-objective optimization is to aggregate the multiple objectives into a single objective, thus simplifying the calculations and reaching a proper solution [57]. The main purpose for feature selection is to minimize the number of selected features while maximize the accuracy of the chosen features in most cases [58]. Based on the above background, we design a linear weighting formulation for jointly optimizing the two objectives, which is expressed as follows:

$\displaystyle\textit{Fitness(S)}=\alpha\times(1-\gamma(S))+\beta\times\frac{% \left|\textit{Cn}\right|}{\left|\textit{Tn}\right|}$ (1)

where Fitness(S) denotes the obtained fitness value of a given subset $S$ from the training set (a training set is a subset that be extracted from the whole data set, and then the rest is the test set), $\alpha\in[0,1]$ represents the weights of classification accuracy and $\beta=1-\alpha$ indicates the weights of feature reduction on the training set. Moreover, $\gamma(S)$ represents the classification accuracy of the subset $S$ for a certain classifier, Cn is the number of selected features in the subset and Tn is the number of total features in the data set.

From Eq. (1), we can conclude that the two optimization objectives are trade-off, since a subset with a higher classification accuracy corresponds to a larger selected feature number and vice versa. Thus, achieving a higher classification accuracy while maintaining a smaller selected features becomes a challenging task.

Moreover, Feature selection is described as an NP-hard problem in the previous work [59], which indicates that the designed fitness function is difficult to be solved by conventional algorithms. In recent researches, population-based evolutionary algorithms which belong to meta-heuristic algorithms may solve the considered NP-hard problem to some extent [60], thus we plan to tackle the such problem by utilizing effective evolutionary algorithms such as BES and PSO.

3.2 Bald eagle search algorithm

As a nature-based meta-heuristic algorithm, BES was proposed to simulate the hunting strategy or intelligent social behavior of bald eagles when they look for fish. The process of bald eagle’s hunting is divided into three stages. In the first stage (i.e., selecting space), the eagles tend to select the space with the largest number of preys. In the second stage (i.e., searching in space), the eagles move around the selected space to find prey. In the third stage (i.e., swooping to the preys), the eagles swing from the best position identified in the second stage and determine the best hunting point. Moreover, the movements of the eagles depend on a central point in these stages. In the selecting stage, the eagles move from the center point to the selected search space. In the searching stage, the eagles search around the central point in the search space. In the swooping stage, the eagles move from the center of the search space to the preys.

Accordingly, the main procedure of the algorithm can be divided into three parts: selecting phase, searching phase and swooping phase. These behaviors can be defined as follows.

(1)
Selecting phase. Eagles identify the food and select the best area in the selected search space (according to the amount of food) in this phase. Therefore, this behavior can be modeled as:

$\displaystyle X_{\textit{new},i}^{t}=X_{\textit{best}}+\nu\times\textit{rand}(% 0,1)\times(X_{\textit{mean}}^{t}-X_{i}^{t})$ (2)

where $X$ is the position of eagle, $t$ refers to iteration number, $\nu$ is the control weight related to the position, $\textit{rand}(0,1)$ indicates generating a random number varies in [0, 1]. In addition, $X_{\textit{best}}$ is the current optimal position from the past, $X_{\textit{mean}}^{t}$ represents the mean position considering all useful information from the previous points, $X_{i}^{t}$ is the $i$ th eagle’s position at the current iteration.
(2)
Searching phase. In this phase, eagles search for prey in the selected search space and move in different directions within the spiral space to speed up the search process. The best position is mathematically expressed as:

$\displaystyle X_{\textit{new},i}^{t}=X_{i}^{t}+\Delta Y$ (3)

where $\Delta Y$ is the step factor in the searching phase and can be expressed as:

$\displaystyle\Delta Y=p(i)\times(X_{i}-X_{i+1}^{t})+q(i)\times(X_{i}^{t}-X_{% \textit{mean}}^{t})$ (4)

where $p(i)$ and $q(i)$ are the inertia coefficients that control the position of eagle in polar coordinates, which vary in ( $-$ 1, 1) and they can be expressed as:

$\displaystyle p(i)=\frac{pr(i)}{\textit{max}(\left|pr\right|)}$ (5) $\displaystyle q(i)=\frac{qr(i)}{\textit{max}(\left|qr\right|)}$ (6)

where $pr(i)$ and $qr(i)$ are calculated as:

$\displaystyle pr(i)=r(i)\times\textit{sin}(\theta(i))$ (7) $\displaystyle qr(i)=r(i)\times\textit{cos}(\theta(i))$ (8)

where $\theta(i)$ and $r(i)$ are the polar angle and polar diameter of the spiral equation, respectively, which can be calculated as:

$\displaystyle\theta(i)=d\times\pi\times\textit{rand}(0,1)$ (9) $\displaystyle r(i)=\theta(i)+G\times\textit{rand}(0,1)$ (10)

where $d$ estimates the corner position during the point search around the center point and varies in [5,10]. Specially, $G$ varies in [0.5,2], which represents the number of search cycle.
(3)
Swooping phase. Eagles swing from the best position in the search space to their target prey in this phase. Consequently, all points also tend to move towards the best point. This behavior can be mathematically modeled as:

$\displaystyle X_{\textit{new},i}^{t}=\textit{rand}(0,1)\times X_{\textit{best}% }+\Delta Z$ (11)

where $\Delta Z$ is the step factor in the swooping phase and can be expressed as:

$\displaystyle\Delta Z=p{{}^{\prime}}(i)\times(X_{i}^{t}-m_{1}\times X_{\textit% {mean}}^{t})+q^{\prime}(i)\times(X_{i}^{t}-m_{2}\times X_{\textit{best}})$ (12)

where $m_{1}$ and $m_{2}$ are the control weights that increase the movement intensity of the eagle to the best point and center point and they both vary in [1,2]. In addition, $p^{{}^{\prime}}(i)$ and $q^{\prime}(i)$ also indicate the position of eagle in polar coordinates, and they can be expressed as:

$\displaystyle p^{\prime}(i)=\frac{pr(i)}{\textit{max}(\left|pr\right|)}$ (13) $\displaystyle q^{\prime}(i)=\frac{qr(i)}{\textit{max}(\left|qr\right|)}$ (14)

where $pr(i)$ and $qr(i)$ are calculated as:

$\displaystyle pr(i)=r(i)\times\textit{sinh}(\theta(i))$ (15) $\displaystyle qr(i)=r(i)\times\textit{cosh}(\theta(i))$ (16)

where $\theta(i)$ is defined in Eq. (9) and $r(i)=\theta(i)$ .

The main procedures of BES are shown in Algorithm 3.2.

[] BES Algorithm Initialize the population size $N$ , the population (eagles): $X_{i}$ ( $i=1,2,\ldots,N$ ), the maximum iteration $t_{\textit{max}}$ and $X_{\textit{best}}$ , etc.; $t=1$ to $t_{\textit{max}}$ //Selecting phase $i=1$ to $N$ Calculate $X_{\textit{new,i}}^{t}$ by the Eq. (2);Update $X_{\textit{best}}=X_{\textit{new,i}}^{t}$ if $\textit{Fitness}(X_{\textit{new,i}}^{t})$ is better than $\textit{Fitness}(X_{\textit{best}})$ ;//Searching phase $i=1$ to $N$ Calculate $p(i)$ , $q(i)$ , $pr(i)$ , $qr(i)$ , $\theta(i)$ and $r(i)$ by using Eqs (5), (6), (7), (8), (9), (10), respectively;Calculate $X_{\textit{new,i}}^{t}$ by the Eq. (3);Update $X_{\textit{best}}=X_{\textit{new,i}}^{t}$ if $\textit{Fitness}(X_{\textit{new,i}}^{t})$ is better than $\textit{Fitness}(X_{\textit{best}})$ ;//Swooping phase $i=1$ to $N$ Calculate $\theta(i)$ , $p^{\prime}(i)$ , $q^{\prime}(i)$ , $pr(i)$ and $qr(i)$ by using Eqs (9), (13), (14), (15), (16), respectively, and $r(i)=\theta(i)$ ;Calculate $X_{\textit{new,i}}^{t}$ by the Eq. (11);Update $X_{\textit{best}}=X_{\textit{new,i}}^{t}$ if $\textit{Fitness}(X_{\textit{new,i}}^{t})$ is better than $\textit{Fitness}(X_{\textit{best}})$ ; Return $X^{\textit{best}}$ .// $X^{\textit{best}}$ is the best solution obtained by BES
3.3 Particle swarm optimization algorithm

PSO was inspired by the swarming or collaborative behaviors of birds foraging to their foods in biological populations. The basic core idea of PSO algorithm is to share the information of individuals in the group. Then the movement of the whole group produces an evolutionary process from disorder to order in the problem-solving space, so as to obtain the optimal solution of the problem.

PSO can take birds foraging randomly in a space as an example. All birds do not know where the food is, but they know how far it is. The simplest and effective way is to search the surrounding area of the bird nearest to the food. A possible solution obtained by PSO algorithm is represented by a particle and the population is consist of all the candidate particles. In each iteration, a particle will accelerate towards its own optimal solution or towards to the global optimal position found by any particle in the population so far. This means that if a particle finds a better solution, the remaining particles will move towards it and the population finally reach to the optimal solution to a certain problem. In order to realize the above objectives, each particle has two properties that are position ( $X$ ) and velocity ( $V$ ), the population records the personal optimal solution ( $X^{t}_{\textit{pbest}}$ ) at the current iteration and the global optimal solution ( $X_{\textit{gbest}}$ ) during the searching process. Accordingly, all particles update their positions and velocities as follows:

$\displaystyle X^{t+1}_{i}=X^{t}_{i}+V^{t+1}_{i}$ (17) $\displaystyle V^{t+1}_{i}=\omega\times V^{t}_{i}+c_{1}\times\textit{rand}(0,1)% \times(X^{t}_{\textit{pbest}}-X^{t}_{i})+c_{2}\times\textit{rand}(0,1)\times(X% _{\textit{gbest}}-X^{t}_{i})$ (18)

where $\omega$ is the inertia coefficient that controls the local search or global search and varies from (0, 1). Specifically, $c_{1}$ and $c_{2}$ refer to the cognition factor and social factor, respectively.

The main procedures of PSO are shown in Algorithm 3.3. [] PSO Algorithm Initialize the population size $N$ , the population (particles): $X_{i}$ ( $i=1,2,\ldots,N$ ), the maximum iteration $t_{\textit{max}}$ , $X_{\textit{gbest}}$ and $X_{\textit{pbest}}$ , etc.; $t=1$ to $t_{\textit{max}}$ $i=1$ to $N$ Calculate the fitness value of the current particle: $\textit{Fitness}(X_{i}^{t})$ ;Calculate $V^{t+1}_{i}$ by the Eq. (18);Calculate $X^{t+1}_{i}$ by the Eq. (17); $\textit{Fitness}(X_{i}^{t})<\textit{Fitness}(X_{\textit{pbest}}^{t})$ $X_{\textit{pbest}}^{t}=X_{i}^{t}$ ; $Fitness(X_{\textit{pbest}}^{t})$ = $\textit{Fitness}(X_{i}^{t})$ ; $\textit{Fitness}(X_{i}^{t})<\textit{Fitness}(X_{\textit{gbest}})$ $X_{\textit{gbest}}=X_{i}^{t}$ ; $\textit{Fitness}(X_{\textit{gbest}})=\textit{Fitness}(X_{i}^{t})$ ; Return $X^{\textit{gbest}}$ .// $X^{\textit{gbest}}$ is the best solution obtained by PSO

4. The proposed hybrid basic bald eagle search-particle swarm optimization (HBBP) and hybrid chaos-based bald eagle search-particle swarm optimization (HCBP)

4.1 Motivation of hybridization

It is known that the solution to the feature selection problem is discrete and the dimension of the solution is equal to the feature number in the selected subset, which means the solution space is huge if the subset characterizes lots of features. On the other hands, PSO as a novel computational intelligence technique, has succeeded in many continuous problems. In addition, there are some well-known binary versions of PSO are proposed to solve mono- and multi-objective optimization problems in the literature. For example, Nguyen et al. [61] proposed a dynamic sticky binary PSO by developing a dynamic parameter control strategy based on an investigation of exploration and exploitation in the binary search spaces. Luh et al. [62] proposed a modified binary PSO algorithm that adopts the concept of genotype-phenotype representation to apply for continuum structural topology optimization. Moreover, Chauhan et al. [63] extended the application of Gompertz PSO for solving binary optimization problems. However, as an emerging intelligent optimization method, there is few research concentrated on developing the binary BES. Since PSO are embedded into BES in this research, the existed binary versions are not suitable for the proposed hybrid method, which motivates us to improve BES and PSO to be applicable for the designed fitness function.

According to the previous studies [42], BES shows great advantages in dealing with some specific optimization problems. More specifically, BES is capable of covering the search space better, allowing for rapid convergence in the initial iterations. However, BES has been found to lack adequate searching efficiency and has a tendency to become stuck in local optima, which can negatively impact the final outcome [64]. Moreover, combing BES with powerful operators with other meta-heuristic algorithms to improve its performance is also recommended by the proposers in [14]. On the other side, PSO shows the characteristics of few parameters need to be adjusted, the simple principle and easily to be implemented in most optimization problems. Unfortunately, PSO exhibits certain limitations, including poor local search capabilities, imprecise search accuracy, and premature convergence [65]. Since both BES and PSO are effective meta-heuristic algorithms, and PSO is often combined with other algorithms to propose a new hybrid meta-heuristic algorithm, which motivates us to consider combining the two algorithms. By overcoming the shortcomings of the originals and maximizing their advantages, it is possible to quickly find the high-quality candidate solutions to the proposed feature selection problems.

In this work, we firstly propose a combinative strategy to effectively regard the two algorithms as one, and further we introduce a new chaotic map to adjust the parameter of the hybrid method, which will balance the exploitation and exploration capabilities of the proposed algorithm.

4.2 HBBP

In this work, we propose a combination of BES and PSO following the taxonomy of low-level teamwork hybrid (LTH) group [13]. More specifically, we embed PSO into BES to improve the convergence accuracy of BES, so as to develop a wrapper-based feature selection method.

4.2.1 Principle of hybridization

Since BES is divided into three phases, the selecting phase is orientated to the search place, while the searching phase and the swooping phase are utilized to locate the direction of the optimal solution and finally to reach to the optimal. Thus, we deploy PSO into BES in the searching phase and the swooping phase.

4.2.2 Strategy of hybridization

In the searching phase and the swooping phase of BES, the new generated solutions are the best solutions at each iteration, i.e., the outputs of these phase can be regarded as the personal optimal solutions in PSO. By the nature of BES, it also maintains a variable, namely, $X_{\textit{best}}$ to indicate the global optimal solution, which is consistent with the definition of $X_{\textit{best}}$ in PSO. Therefore, we efficiently utilize the solution update mechanism of PSO into BES, wherein all the parameters and population of BES are as the inputs to PSO, then the outputs of PSO in the swooping phase are as the final results of the hybrid algorithm.

4.2.3 Solution update method

Through the above principle and combination strategy, we reformulate the Eq. (3) into the follows:

$\displaystyle X_{\textit{new},i}^{t}=\omega\times X_{i}^{t}+c_{1}\times\textit% {rand}(0,1)\times(\Delta Y-X^{t}_{i})+c_{2}\times\textit{rand}(0,1)\times(X_{% \textit{best}}-X_{i}^{t})$ (19)

In addition, the Eq. (11) is reformulated into the follows:

$\displaystyle X_{\textit{new},i}^{t}=\omega\times X_{\textit{best}}+c_{1}% \times\textit{rand}(0,1)\times(\Delta Z-X^{t}_{i})+c_{2}\times\textit{rand}(0,% 1)\times(X_{\textit{best}}-X_{i}^{t})$ (20)

The structure of HBBP is displayed in Fig. 1.

Figure 1.

The structure of HBBP.

4.3 HCBP

In this part, a chaotic map is adopted into HBBP. It is widely known that the inertia coefficient $\omega$ of PSO plays an important role in balancing the exploitation and exploration capabilities, which means a large coefficient facilitates the global search, while the local search is more intended in a relatively small coefficient. The PSO process often suffers from an inappropriate coefficient, which causes the dilemma in PSO, such as trapping in the local optima or premature convergence [28]. Conclusively, a proper value of the inertia coefficient is essential for PSO. However, the adjustment of the value is difficult, especially it depends on the specific optimization problem. We plan to introduce a novel chaotic map to adjust $\omega$ in this section to improve the performance of HBBP.

4.3.1 Chaotic map

Chaos can be mathematically defined as a semi-random number generated by a simple deterministic system. Chaotic map is used to generate chaos between 0 and 1 to replace the pseudo-random number generator in the optimization field. In general, chaos has the characteristics of nonlinearity, ergodicity, local instability, overall stability and long term unpredictability [66]. Mathematically, chaos performs the search faster than ergodic search and thereby we adopt chaos in meta-heuristic algorithms to potentially generate a diverse population of candidate solutions in the initialization stage.

In this paper, a chaotic map called Logist-sine (LS) [67] is used to adjust the value of $\omega$ at each iteration. The map is described as follows:

$\displaystyle\omega\left[h+1\right]=\bigl{(}\lambda\omega\left[h\right](1-% \omega\left[h\right])+\frac{(4-\lambda)\textit{sin}(\pi\omega\left[h\right])}{% 4}\bigr{)}(\textit{mod}1)$ (21)

where $h$ indicates the index of the chaotic sequence, thus $\omega\left[h\right]$ is the $h$ th element in the map, $\lambda$ is a multiplier of chaos that varies in [0,4]. Moreover, mod is the modulo operation. Figure 2 displays the curve of map for $\omega$ through 500 iterations.

Figure 2.

The curve of $\omega$ using chaotic map.

Accordingly, with the utilization of the proposed chaotic map, Eqs (19) and (20) are reformulated into the Eqs (22) and (23), respectively:

$\displaystyle X_{\textit{new},i}^{t}=\omega\left[t\right]\times X_{i}^{t}+c_{1% }\times\textit{rand}(0,1)\times(\Delta Y-X^{t}_{i})+c_{2}\times\textit{rand}(0% ,1)\times(X_{\textit{best}}-X_{i}^{t})$ (22) $\displaystyle X_{\textit{new},i}^{t}=\omega\left[t\right]\times X_{\textit{% best}}+c_{1}\times\textit{rand}(0,1)\times(\Delta Z-X^{t}_{i})+c_{2}\times% \textit{rand}(0,1)\times(X_{\textit{best}}-X_{i}^{t})$ (23)

4.4 Binary optimization

BES and PSO are originally developed to solve the continuous optimization problems, however the proposed feature selection formulation is discrete. Specifically, the solution is binary. The solution to the proposed problem can be represented as 1 (the feature is selected) or 0 (the feature is discarded) for each feature, and the dimension of the solution is as same the number of the features in the tested data set. Thus, the proposed hybrid algorithm should be altered from continuous-solving to the binary-solving for the feature selection optimization. In this section, the binary strategy is introduced in the proposed methods.

A common method to convert the continuous problem-solving algorithm to the binary one is to adopt the transfer function [68]. For most binary optimization problems, two types of transfer function that are named according to the shape of the function curve, called $S$ -shape transfer function and $V$ -shape transfer function, are widely used [17]. It is to be noted that the $V$ -shape transfer function are more often adopted to complement the integrity of the variables, which guarantees the changing of the search agent is proportional to its step vector [69]. Moreover, the $V$ -shape transfer function was found more effectively to reduce the number of selected features in our proposed hybrid algorithm by testing (the comparative results are listed in Section 5.2.2), thus the $V$ -shape transfer function is adopted in this paper.

The transfer rate can be described as follows [70].

$\displaystyle P(\Delta{X}_{t})=\left|1-\frac{2}{1+{e^{\Delta{X}_{t}}}}\right|$ (24)

where $\Delta{X}_{t}$ is the transfer factor in the proposed hybrid algorithm. The function curve of $P(\Delta{X}_{t})$ against $\Delta{X}_{t}$ is shown in Fig. 3. Specifically, we propose two hybrid algorithms and each algorithm contains three phases, thus the transfer factor and the transfer function should be discussed in categories.

Figure 3.

The curve of $P(\Delta{X}_{t})$ against $\Delta{X}_{t}$ .

4.4.1 Transfer facor of HBBP

The transfer factor of HBBP is described as follows.

$\displaystyle\Delta{X}_{t}=\begin{cases}X_{\textit{new, i}}^{t},&\text{% obtained in \Eq{2}{} Selecting phase}\\ X_{\textit{new, i}}^{t},&\text{obtained in Eq. (19) Searching phase}\\ X_{\textit{new, i}}^{t},&\text{obtained in Eq. (20) Swooping phase}\\ \end{cases}$ (25)

4.4.2 Transfer factor of HCBP

The transfer factor of HCBP is defined as follows.

$\displaystyle\Delta{X}_{t}=\begin{cases}X_{\textit{new, i}}^{t},&\text{% obtained in Eq. (2) Selecting phase}\\ X_{\textit{new, i}}^{t},&\text{obtained in Eq. (22) Searching phase}\\ X_{\textit{new, i}}^{t},&\text{obtained in Eq. (23) Swooping phase}\\ \end{cases}$ (26)

4.4.3 Transfer function

HBBP and HCBP share a common function that is iteratively defined as follows.

$\displaystyle\gamma_{j}^{t}=\begin{cases}1-\varepsilon_{j}^{t},&\textit{if % rand(0,1)}\leqslant P(\Delta{X}_{t+1})\\ \varepsilon_{j}^{t},&\textit{otherwise}\end{cases}$ (27)

where $\gamma_{j}^{t}$ is the $j$ th dimension in the solution obtained in selecting phase at $t$ th iteration and $\gamma_{j}^{t}\in X_{\textit{new,i}}^{t}$ . In addition, $\varepsilon_{j}^{t}$ is the $j$ th dimension in the reference solution at $t$ th iteration and $\varepsilon_{j}^{t}\in X_{\textit{best}}$ (in the selecting phase), $\varepsilon_{j}^{t}\in X_{i}^{t}$ (in the searching phase and swooping phase), respectively.

The main procedures of the proposed binary optimization are shown in Algorithm 4.4.3. [] Binary Optimization Initialize the parameters and achieve the population from different hybrid algorithms in previous iterations $X_{t}$ , and the dimension size of the solution $N_{\textit{dim}}$ , etc.; $k=1$ to $N_{\textit{dim}}$ Calculate the transfer factor by using Eq. (25) or Eq. (26) according to the certain hybrid strategy;Calculate the trafer rate by using Eq. (24);Update the value of the $k$ th dimension in the solution by using Eq. (27);

Return $X_{t+1}$ .// $X_{t+1}$ is the updated binary solution at the next iteration

At this point, all the details of the proposed hybrid algorithms have been introduced, and the general framework of HBBP and HCBP are presented in Algorithm 4.4.3.

[] The proposed hybrid algorithm Initialize the population size $N$ , the maximum iteration $t_{max}$ and $X_{\textit{best}}$ , etc.;Randomly initialize the population (eagles) and binarize the dimension (0 or 1) of the population: $X_{i}$ ( $i=1,2,\ldots,N$ ) $t=1$ to $t_{\textit{max}}$ //Selecting phase $i=1$ to $N$ Calculate $X_{\textit{new,i}}^{t}$ by the Eq. (2) and convert it into binary by using Algorithm 4.4.3;Update $X_{\textit{best}}=X_{\textit{new,i}}^{t}$ if $\textit{Fitness}(X_{\textit{new,i}}^{t})$ is better than $\textit{Fitness}(X_{\textit{best}})$ ;//Searching phase $i=1$ to $N$ Calculate $p(i)$ , $q(i)$ , $pr(i)$ , $qr(i)$ , $\theta(i)$ and $r(i)$ by using Eqs (5), (6), (7), (8), (9), (10), respectively;HBBP: Calculate $X_{\textit{new,i}}^{t}$ by the Eq. (19) and convert it into binary by using Algorithm 4.4.3;HCBP: Initialize $\omega\left[t\right]$ by the Eq. (21) and calculate $X_{\textit{new,i}}^{t}$ by the Eq. (22), then convert it into binary by using Algorithm 4.4.3;Update $X_{\textit{best}}=X_{\textit{new,i}}^{t}$ if $\textit{Fitness}(X_{\textit{new,i}}^{t})$ is better than $\textit{Fitness}(X_{\textit{best}})$ ;//Swooping phase $i=1$ to $N$ Calculate $\theta(i)$ , $p^{\prime}(i)$ , $q^{\prime}(i)$ , $pr(i)$ and $qr(i)$ by using Eqs (9), (13), (14), (15), (16), respectively, and $r(i)=\theta(i)$ ;HBBP: Calculate $X_{\textit{new,i}}^{t}$ by the Eq. (20) and convert it into binary by using Algorithm 4.4.3;HCBP: Initialize $\omega\left[t\right]$ by the Eq. (21) and calculate $X_{\textit{new,i}}^{t}$ by the Eq. (23), then convert it into binary by using Algorithm 4.4.3;Update $X_{\textit{best}}=X_{\textit{new,i}}^{t}$ if $\textit{Fitness}(X_{\textit{new,i}}^{t})$ is better than $\textit{Fitness}(X_{\textit{best}})$ ; Return $X^{\textit{best}}$ .// $X^{\textit{best}}$ is the best solution obtained by BES

4.5 Complexity analysis of the proposed hybrid algorithm

As shown in Algorithm 4.4.3, there are three loops executed in the algorithm, namely, the maximum iteration $t_{\textit{max}}$ , the population size $N$ and the dimension number of a solution $N_{\textit{dim}}$ . Compared to the three loops, other calculations such as parameters initializing, binarization of the population, population updating can be ignored. Therefore, the complexity of the proposed HBBP and HCBP are both $\mathcal{O}(t_{\textit{max}}\times N\times N_{\textit{dim}})$ . However, HCBP may consume more time than HBDP since it conducts extra chaos initialization during the search process. The exact computing time is depended on the computer resources that is difficult to be predicted and will be further evaluated in the following section.

4.6 Solution expression

The solution expression is also an important issue in a problem-solving algorithm when dealing with feature selection problems. Referred to the previous work in [71], we use a $N_{\textit{dim}}$ binary vector to represent the solution obtained by the proposed algorithm, where $N_{\textit{dim}}$ is equals to the total number of features in a certain data set. The value of each dimension of the solution is set to be “1” (the corresponding feature is selected) or "0" (the corresponding feature is discarded). An example of the binary expression of a solution is exhibited in Fig. 4. Supposed that the population size is $N$ , the population of the proposed hybrid algorithm can be expressed as follows:

$\displaystyle\textit{population}=\begin{bmatrix}X_{1}\\ X_{2}\\ \vdots\\ X_{i}\\ \vdots\\ X_{N}\end{bmatrix}=\begin{bmatrix}x_{1}^{1},&x_{2}^{1},&\ldots,&x_{N_{\textit{% dim}}}^{1}\\ x_{1}^{2},&x_{2}^{2},&\ldots,&x_{N_{\textit{dim}}}^{2}\\ \vdots&\vdots&&\vdots\\ x_{1}^{i},&x_{2}^{i},&\ldots,&x_{N_{\textit{dim}}}^{i}\\ \vdots&\vdots&&\vdots\\ x_{1}^{N},&x_{2}^{N},&\cdots,&x_{N_{\textit{dim}}}^{N}\end{bmatrix}$ (28)

where $X_{i}=(x_{1}^{i},x_{2}^{i},x_{3}^{i},\ldots,x_{N_{\textit{dim}}}^{i})$ is used to express to the $i$ th solution of the population.

5. Experiments

In this section, the results of the proposed hybrid algorithms are presented. The benchmark data sets are firstly introduced before the experiments. Then we report the parameter values used in the experiments. Next, some algorithms in the literature are adopted as comparisons to the proposed hybrid algorithm in various perspectives. Finally, the results are analyzed and the performance of the algorithms are evaluated.

5.1 Benchmark data sets

In this paper, 17 well-known data sets drawn from the UCI data repository [72] are selected as the benchmark to conduct the relative experiments. In addition, Table 1 describes the details of these data sets in terms of data set names, number of features and number of instances.

Table 1
Benchmark data sets

No.	Names	No. of features	No. of instances
1	Breastcancer	10	699
2	BreastEW	30	569
3	Congress	16	435
4	Exactly	14	1000
5	Exactly-II	14	1000
6	HeartEW	14	294
7	Heart-StatLog	13	270
8	Hepatitis	19	142
9	Hillvalley	100	606
10	Lung	326	72
11	Lymphography	18	148
15	M-of-N	14	1000
13	Movementlibras	90	360
14	Sonar	60	208
15	WDBC	31	569
16	Wine	13	178
17	Zoo	16	101

Figure 4.

An example of binary representation of solution.

5.2 Experimental setup

All the experiments are implemented on a PC with a 8th generation Intel Core i5 processor (i5-8500U) and the base frequency of the processor is 3.00 GHz. The random access memory (RAM) is 16 GB and a external GPU named 1050TI is equipped with the PC. Moreover, all algorithms are programmed by Python 3.10 and the actual Python code of the proposed hybrid algorithms can be freely downloaded at “https://github.com/sungeng207/HBBP-HCBP”.

5.2.1 Parameter tuning

In the process of solving the meta-heuristic based feature selection problem, choosing a proper set of values for the involved parameter in the algorithm plays an important role as these parameters are tightly related to the convergence rate and control the whole learning process, which may determines whether the proposed model is successful or not. It is worthy noting that there are some parameters used in the proposed hybrid algorithm, such as $\nu$ in Eq. (2), $d$ in Eq. (9), $G$ in Eq. (10), $\omega$ in Eqs (19) and (20), etc. However, tuning these parameters synchronously on all data sets is a challenging work and perhaps a proper set for these parameters cannot be found in fact. Referred to [73], we select a typical data set named Lung-cancer with median number of selected features (has 56 features and 32 instances) to jointly tune the parameters.

First of all, the cognition factor $c_{1}$ and the social factor $c_{2}$ should be carefully considered in this paper since they are involved in both HBBP and HCBP, namely the Eqs (19), (20), (22) and (23), respectively. The tuning tests are conducted on HCBP since the two hybrid models share similar structure and HCBP is more practical than HBBP. In addition, $c_{1}$ and $c_{2}$ both varies from 1 to 2 and the step size is 0.25, which results in 25 different combinations of the various values of $c_{1}$ and $c_{2}$ .

Another pair of parameters that should be carefully considered are the multiplier of chaos (i.e., $\lambda$ in Eq. (21)) and the initial value of the chaos (i.e., $\omega\left[0\right]$ )in $L S$ map. It is widely known that the performance of chaos is tightly related to its coefficient and initial value, thus we conduct extensive experiments to tune the parameters of $\lambda$ and $\omega\left[0\right]$ in this part. Firstly, the number of chaos should be varied in (0, 1) since it replaces the inertia coefficient $\omega$ in Eq. (18), so $\omega\left[0\right]$ varies from 0 to 1, more specifically, we set $\omega\left[0\right]$ start from 0.1 and the step size is 0.2. Besides, in order to reduce the number of combinations, let $\lambda$ start from 0.15 and the next value is twice the previous until it reaches to its maximum, which contributes to 30 different combinations of the various values of $\lambda$ and $\omega\left[0\right]$ .

In order to reduce the randomness, the experiment on each combination is conducted in 30 times and the average results are listed in Tables 2 and 3. Accordingly, HCBP achieves the best among the indicators when $c_{1}=1.5$ and $c_{2}=2$ in Table 2 and reach to optimal when $\lambda=0.3$ and $\omega\left[0\right]=0.3$ in Table 3 regardless of the CPU running time. In practice, achieving a better performance at the expanse of sacrificing some computing time is always acceptable, thus the values of these parameters are applied to the all data sets.

Table 2
Parameter tuning results of $c_{1}$ and $c_{2}$ in the proposed hybrid algorithm

$c_{1}$	Indicators	$c_{2}=$ 1	$c_{2}=$ 1.25	$c_{2}=$ 1.5	$c_{2}=$ 1.75	$c_{2}=$ 2
1	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	12.1	11.8	11.7667	11.5667	10.9667
	Fitness	0.0269	0.0269	0.0269	0.0268	0.0267
	Time	55.5624	43.7234	49.8028	49.1307	44.9950
1.25	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	11.0333	11.8	10.7667	12.1667	11.4
	Fitness	0.0267	0.0269	0.0267	0.0269	0.0268
	Time	52.1259	52.1873	52.0012	51.8587	51.8498
1.5	Accuracy	0.975	0.975	0.975	0.975	0.9758
	Features	11.5667	10.7667	11	11.5667	10.7333
	Fitness	0.0268	0.0267	0.0267	0.0268	0.0258
	Time	52.1264	52.6656	52.8510	52.1264	52.6542
1.75	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	11.7667	10.6667	10.7667	11.8	10.6
	Fitness	0.0269	0.0267	0.0267	0.0269	0.0266
	Time	54.9641	49.5741	50.3025	54.9566	54.8185
2	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	12.5	10.6	11.3667	12.3333	11.0667
	Fitness	0.0270	0.0266	0.0268	0.0270	0.0267
	Time	48.5165	48.4954	48.0295	48.4626	48.4468

Table 3

Parameter tuning results of $\lambda$ and $\omega\left[0\right]$ in the $L S$ map

$\lambda$	Indicators	$\omega\left[0\right]$ =0.1	$\omega\left[0\right]=$ 0.3	$\omega\left[0\right]=$ 0.5	$\omega\left[0\right]=$ 0.7	$\omega\left[0\right]=$ 0.9
0.15	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	10.4	10.9667	11.7667	11.2667	10.7667
	Fitness	0.0266	0.0267	0.0269	0.0268	0.0267
	Time	50.9362	51.4357	51.0033	51.3402	51.1822
0.3	Accuracy	0.975	0.9752	0.975	0.975	0.975
	Features	11.9333	9.8667	11.2667	12.3333	11.5333
	Fitness	0.0269	0.0265	0.0268	0.027	0.0268
	Time	51.0668	50.3441	50.6789	50.555	50.3476
0.6	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	11.1667	10.1	11.1667	12.3	11.5
	Fitness	0.0267	0.0266	0.0267	0.0269	0.0268
	Time	50.5174	50.6376	50.3359	50.5075	50.9079
1.2	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	12.3333	12.1667	11.2667	11.1	12.3333
	Fitness	0.027	0.0269	0.0268	0.0267	0.027
	Time	50.4698	50.6978	50.651	53.5005	53.4459
2.4	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	11.4333	11.0333	12.1667	11.5	10.9333
	Fitness	0.0268	0.0267	0.0269	0.0268	0.0267
	Time	53.7461	43.1472	53.4774	53.5476	53.7517
4	Accuracy	0.975	0.975	0.975	0.975	0.975
	Features	11.4	10.9	10.9333	11.1667	11.0333
	Fitness	0.0268	0.0267	0.0267	0.0267	0.0267
	Time	47.6033	47.941	47.9984	47.9757	47.6404

For the remaining parameters used in the proposed hybrid algorithm, we use the recommended values in the literature and the details are listed in Table 4.

Table 4

Parameter settings of the proposed hybrid algorithm

Parameter(s)	The Corresponding Meaning	Value(s)	Equation(s)	Refer to
$d$	Estimating the corner position during the point search around the center point in the searching phase of BES	10	(9)	[14, 42]
$\nu$	The control weight related to the current position of the search agent in the selecting phase of BES	2	(2)
$G$	The number of search cycle in the searching phase of BES	1.5	(10)
$m_{1},m_{2}$	The control weights that increase the movement intensity of the search agent to the best point (or the center point) in the selecting phase of BES	2	(12)
$\omega$	The inertia coefficient that controls the local search or global search in PSO	0.7298	(19), (20)	[74]
$\alpha$ , $\beta$	The weights of classification accuracy and feature reduction in the proposed fitness function of feature selection	0.99, 0.01	(1)	[35, 38]

5.2.2

V

-shape versus

S

-shape transfer function

In this part, we verify the effectiveness of two common transfer function (i.e., $V-$ shape and $S-$ shape) to determine which function should be utilized in the proposed algorithm. The transfer rate of $V-$ shape function is modelled as Eq. (24), and the transfer rate of the widely used Sigmoid function can be mathematically described as follows [75]:

$\displaystyle\begin{aligned} \displaystyle P(\Delta{X}_{t})=\frac{1}{1+e^{-% \Delta{X}_{t}}}\end{aligned}$ (29)

In addition to notice that except the transfer rate, other details of $S$ -shape function utilized in the proposed methods are as same as $V$ -shape function described in Section 4.4. Similar to the previous tuning principle in Section 5.2.1, we adopt Lung-cancer to compare the effectiveness of the two transfer function. Afterwards, without losing generality, the obtained results are applied to three data sets with median number of features (i.e., Hillvalley, Movementlibras and Sonar) in Table 1 to further confirm the obtained results. Figure 5 flashes the histograms of the four indicators obtained by $V$ -shape versus $S$ -shape transfer function. As can be seen from the results, the $V$ -shape function performs a better performance in terms of classification accuracy, fiteness value, selected feature numbers and CPU computing time in most of the four data sets. In summary, $V$ -shape transfer function is more suitable for applying to the proposed algorithms in this paper.

Figure 5.

Average results of $V-$ shape versus $S-$ shape transfer function.

5.2.3 Classifier utilized

In this work, we utilize $K$ NN method as the classifier, since it is quite convenient to be implemented and performs effectively in the previous works [76].

The core idea of $K$ NN method is that a sample (instance) is assigned the same class label as most of its $k$ ’s nearest neighbors in the data set. Thus a certain data set is divided into two parts: learning set and validation set. When establishing a learning set, it is necessary to determine the learning data and its corresponding class labels. Then the characteristics of the test data in the validation set are compared with the learning data. Finally, the class label with the most votes of the chosen $k$ ’s nearest neighbors in the learning set is assigned to the validation data.

Particularly, the Euclidean distance from the validation samples to the learning samples is adopted to calculate the relevance in the following experiments. In this research, the training set is further divided into a learning set with 80% of the instances and the rest is assigned to the validation set, which is also in accordance to [77]. Totally, during each iteration, the classifier is trained using the feature subset from the learning set and its performance is evaluated on the validation set.

From the point of view of fairness, we also originally planned to adopt Lung-cancer to tune the number of neighbors of $K$ NN classifier to make the most use of the proposed methods. Unfortunately, Lung-cancer only has 32 instances, which indicates that more than 32 neighbors cannot be verified in this data set. Therefore, we adopt three alternative data sets with median number of features (i.e., Hillvalley, Movementlibras and Sonar) to determine the number of neighbors ( $k$ ) of $K$ NN classifier in the preliminary experiments. The average results of the each indicator obtained by various $k$ are manifested in Table 5. From the obtained results, we can conclude that $K$ NN method with the value of $k=5$ perceived the optimal accuracy and fitness value in all data sets. Hence, the number of neighbors in the classifier is assigned to be 5 in the rest of this work.

Table 5
Average results of various number of neighbors ( $k$ ) of $K$ NN classifier in the proposed hybrid algorithm

data set	Indicators	$k=$ 3	$k=$ 5	$k=$ 10	$k=$ 20	$k=$ 30	$k=$ 40
Hillvalley	Accuracy	0.6060	0.6319	0.6049	0.5878	0.5661	0.5562
	Features	44.9667	44.1333	43.8667	43.9333	43.3667	39.7667
	Fitness	0.3945	0.3688	0.3956	0.4125	0.4339	0.4434
	Time	188.6473	268.0072	282.9984	197.0484	198.7826	195.3929
Movementlibras	Accuracy	0.8195	0.8606	0.7197	0.5759	0.5249	0.4892
	Features	41.4000	39.6000	41.3667	37.3667	40.5667	38.8333
	Fitness	0.1833	0.1424	0.2821	0.4240	0.4748	0.5100
	Time	215.8122	156.6413	202.8323	161.1943	161.3513	157.7364
Sonar	Accuracy	0.9130	0.9156	0.8835	0.8513	0.8408	0.8202
	Features	25.1667	25.7333	21.4000	21.0333	21.3667	22.0667
	Fitness	0.0903	0.0879	0.1189	0.1507	0.1612	0.1817
	Time	123.1750	89.0292	79.5277	121.2755	119.6377	119.8501

5.2.4 Comparative algorithms

In this paper, six optimization algorithms are adopted as comparison with HBBP and HCBP, which are BES[14], BPSO[78], BGWOPSO[54], BBA[33], BDA[79] and BGWO[30], respectively. The comparative six algorithms are all wrapper-based meta-heuristic algorithms, wherein the $K$ NN method is utilized inside each of them as the classifier. It is noted that BGWOPSO is a hybrid algorithm that aims at combining the both strengths of GWO and PSO. From the perspective of avoiding accidental results, each algorithm is running 30 times independently on the same 17 benchmark data sets (by varying the random seed used to initialize the population, across the runs) and the average results are presented in the followings. Note that in this work we will use a relatively small population size of 24 and the number of maximum iteration is 100 as recommended in the previously cited reference [39], which was indeed enough to achieve good results in our experiments, since most of our datasets do not have a very large number of features in general. However, for datasets with a much larger number of features (with a much larger search space size), a larger population size might be needed to achieve good results. More key parameter settings of the comparative algorithms are shown in Table 6.

Table 6
Key parameter settings of the comparative algorithms

No.	Algorithm	Parameter settings
1	BES	The same as Table 4
2	BPSO	$c1=$ 2, $c2=$ 2
3	BGWOPSO	$c1=c2=$ 2, $c3=$ 2 $w=$ 0.5 $+$ rand(0,1)/2
4	BBA	$A=$ 0.25, $Q_{\textit{max}}=$ 2, $Q_{\textit{min}}=$ 0
5	BDA	$w=$ [0.9, 0.4], $s=$ [0.2, 0], $a=$ [0.2, 0],
		$c=$ [0.2, 0], $f=$ [0.2, 0], $e=$ [0, 0.1]
6	BGWO	$\alpha=$ [2, 0]

5.2.5 Evaluation metrics

In this followings, the performance of each algorithm is evaluated in the four indicators, which are fitness value, classification accuracy, the number of selected features and CPU running time, respectively. For each indicator, there are two statistical measures used to represent the results obtained by different experiments, which are the average value (AVG) and the standard deviation (STD). In addition, two non-parametric statistical tests, namely Wilcoxon rank-sum and Friedman are performed at 5% significance level for the fitness value and the classification accuracy to display the significance of the two indicators in this paper. The reason that we adopt the two non-parametric statistical tests is because they are not necessary to assume the obtained results distributed normally, which may provide a more scientific comparative conclusion in practice [80].

Moreover, the overall wins of the four indicators are also utilized to represent the fairer counts of wins for each method. The term of “win” is used to count the number of the methods that achieve the best results, and the “win” is normalized to be “1” in each data set for all comparative methods. Therefore, a win of 1 is assigned to a method when it is the only winner, but dividing the ‘1 win’ among all methods tied in achieving the best result, with a fractional number of win assigned to each method, which is inversely proportional to the number of tied methods. For example, if four methods (also equals to the whole number of the comparative methods) all achieve the best result in a specific data set, the win of each winner (all the methods are the “winners”) is 0.25, etc. Totally, the overall wins are the sum of the wins in the whole data set of this method. An example of win is visualized as Fig. 6, wherein “Method1”, “Method2” and “Method4” all achieve the best result (i.e., “result1”, “result2” and “result4”, respectively), hence the number of win of these three methods are all set to be “0.33 (1/3)”.

Figure 6.

An example of “win” of the comparative methods in one data set.

Table 7

Fitness function values obtained by HCBP, HBBP, BES and PSO

Data set	HCBP		HBBP		BES		BPSO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	0.0272	0	0.0272	0	0.0272	0	0.0272	0
BreastEW	0.0389	0	0.0389	0	0.0389	0	0.0412	0.0007
Congress	0.0243	0.0025	0.0242	0.0023	0.0244	0.0027	0.0250	0.0021
Exactly	0.0046	0	0.0046	0	0.0046	0	0.0046	0
Exactly-II	0.2099	0.0012	0.2104	0.0042	0.2110	0.0053	0.2097	0
HeartEW	0.1493	0.0016	0.1496	0.0023	0.1493	0.0016	0.1494	0.0012
Heart-StatLog	0.1416	0	0.1416	0	0.1417	0.0001	0.1419	0.0008
Hepatitis	0.2619	0.0042	0.2633	0.0056	0.2625	0.0047	0.2714	0.0057
Hillvalley	0.3936	0.0062	0.3934	0.0046	0.3925	0.0058	0.4005	0.0059
Lung	0.0904	0.0107	0.0924	0.0119	0.0986	0.0136	0.1054	0.0106
Lymphography	0.5574	0.0107	0.5608	0.0096	0.5613	0.0077	0.5610	0.0077
M-of-N	0.0046	0	0.0046	0	0.0046	0	0.0048	0.0005
Movementlibras	0.1822	0.0044	0.1807	0.0041	0.1819	0.0038	0.1870	0.0056
Sonar	0.0831	0.0091	0.0844	0.0095	0.0878	0.0099	0.1010	0.0080
WDBC	0.0389	0	0.0389	0	0.0389	0	0.0416	0.0007
Wine	0.0479	0.0001	0.0481	0.0008	0.0486	0.0023	0.0479	0.0002
Zoo	0.0625	0.0067	0.0607	0.0045	0.0608	0.0048	0.0638	0.0047
Overall wins	7.49		4.99		2.99		1.5
Rank (F-test)	1.8824		2.1471		2.5588		3.4118

5.3 Experiment results

In this section, the obtained results are presented by categories, then the results are analyzed and discussed subsequently. In order to evaluate the effectiveness of the proposed hybrid algorithm, we firstly perform the experiments between the two hybrid strategies and the originals (i.e., BES and BPSO). Next we choose hybrid strategy with the better performance to compare with the state-of-the-art algorithms (i.e., BGWOPSO, BBA, BDA and BGWO). It is noted that the best results of each experiment are highlighted in bold font.

5.3.1 The results of HCBP, HBBP, BES and PSO

Table 7 displays the fitness values of HCBP, HBBP, BES and BPSO. From the results, HCBP achieves the best on 13 data sets (76.47%), thereby the overall wins is ranked first (44.06% wins), which shows great advantages than the others. HBBP achieves 4.99 overall wins (29.35%), followed by BES with 2.99 overall wins. In addition, BPSO is ranked forth with 1.5 wins (8.82%). The STD results show that the above algorithms are quite stable and the difference between each data sets is slight. The overall wins and F-test statistic results of average fitness values of HCBP are both ranked first, which supports the superiority of HCBP on the other side. The reason that HCBP outperforms HBBP is due to the chaos utilization for the position updating in this kind of hybrid strategy indeed works, which will further broaden the solution space and maintain the diversity of the population at each iteration when the algorithm is running, thus HCBP is more likely to find the optimal solutions than HBBP. In addition, we can conclude that HCBP and HBBP both have certain advantages over the original algorithms, which illustrates that the framework of the hybrid model is effective. In other words, the hybrid methods overcome the main limitation of trapping in local optima better than the original algorithms, thus boosting the capabilities of BES and BPSO in balancing between exploration and exploitation.

In order to compare the significance of the fitness values obtained by the various algorithms, we conduct the Wilcoxon sum-rank test for each of the proposed hybrid methods against the original optimizers and the results are displayed in Table 8. As can be seen from the results, HCBP and HBBP both outperform BPSO significantly on 9 data sets, respectively. It is noticed that the meaningless results (the “meaningless” indicates the significance of the proposed hybrid method against the originals is same, in other words, the Wilcoxon sum-rank test results are “1” on these data sets. In order to make the results shown in Table 8 and in the relative significance tables more clear for reading, we replace the results “1” with “NaN” in the rest of this paper) are ignored, thus HCBP and HBBP both outperform BPSO on more than half of the meaningful data sets. The phenomenon demonstrates that the two hybrid methods are both effective in dealing with the proposed feature selection problems and the chaos utilization in hybridization can further improve the performance in optimizing fitness values since HCBP performs better than BES on 3 data sets named Lung, Sonar and Wine compared to HBBP. Surprisingly, the results also indicate from the other side that BES is a candidate effective method that can tackle the feature selection problems better than BPSO because it can reach to an equivalent significance compared to the hybrid methods on majority of the data sets.

Table 8
$p-$ values of the Wilcoxon sum-rank test for the fitness values of obtained by HCBP, HBBP versus the basic optimizers ( $p\leqslant 0.05$ are significant and shown in bold, NaN: Not Applicable)

Data set	HCBP		HBBP
	BES	BPSO	BES	BPSO
Breastcancer	NaN	NaN	NaN	NaN
BreastEW	NaN	9.42E-13	NaN	9.42E-13
Congress	8.05E-01	1.09E-01	8.43E-01	8.05E-02
Exactly	NaN	NaN	NaN	NaN
Exactly-II	5.31E-01	3.17E-01	5.70E-01	3.17E-01
HeartEW	3.17E-01	3.30E-01	1.54E-01	7.11E-01
Heart-StatLog	3.17E-01	2.06E-02	3.17E-01	2.06E-02
Hepatitis	7.35E-01	2.51E-09	6.48E-01	4.63E-07
Hillvalley	4.46E-01	2.60E-05	4.16E-01	2.00E-06
Lung	1.66E-02	7.57E-07	5.64E-02	1.40E-05
Lymphography	4.20E-01	2.40E-01	5.64E-01	5.26E-01
M-of-N	NaN	7.81E-02	NaN	7.81E-02
Movementlibras	4.16E-01	1.03E-04	3.18E-01	3.00E-06
Sonar	3.08E-02	1.93E-09	1.02E-01	1.61E-08
WDBC	NaN	1.06E-12	NaN	1.06E-12
Wine	4.24E-02	5.57E-01	6.61E-01	2.22E-01
Zoo	8.84E-01	2.80E-02	7.97E-01	2.79E-03

Figures 7 and 8 illustrate the results of convergence curves obtained by the two proposed hybrid methods and the originals. It is noticed that the curves are derived from the 15th test of each experiment. Intuitively, the hybrid methods always reach to the faster convergence rate in most cases than the originals. In particular, from these figures, it can be seen that HCBP is faster than HBBP on the data sets named Hillvalley, Lung, Lymphography and Movementlibras. The results affirm the conclusion that the improved factor (i.e., the chaos) introduced in HCBP may speed up the convergence rate compared to HBBP. Therefore, HCBP can always accelerate to find the optimal solution among the comparative algorithms.

Figure 7.

Convergence curves obtained by HCBP, HBBP, BES and PSO in first 9 data sets.

Figure 8.

Convergence curves obtained by HCBP, HBBP, BES and PSO in last 8 data sets.

Table 9 specifies the results of classification accuracy obtained by HCBP, HBBP and the originals. The distribution of the optimal results are similar to Table 7. Totally, the overall wins of HCBP is 7.32, which is the highest among the comparative algorithms. HBBP is ranked second with 3.82 overall wins. Followed by BES with 3.32 overall wins and the lowest number of wins is achieved by BPSO. The average results of overall wins of classification accuracy support our judgments mentioned above and the hybrid algorithm further embodies the advantages than the originals. Specifically, BES outperforms BPSO on 5 data sets, i.e., BreastEW, Heart-StatLog, Hillvalley, M-of-N and WDBC, which indicates that BES is more effective in dealing with the designed feature selection problem to some degree. From the STD results, it can be observed that all the comparative algorithms are stable and equivalent since HBBP and BPSO both achieve best on 8 data sets, HCBP and BES are ranked first with 9 data sets. Additionally, F-test statistic results support the superiority of HCBP over the others. Therefore, from the results of the fitness value and classification accuracy, we can conclude that the two hybrid strategies are both competitive compared to the originals, as well as the chaos utilization in HCBP significantly improves the performance of the proposed method, which contributes to finding more optimal solutions.

Table 10 displays the significance of the classification accuracy obtained by the hybrid methods versus the basic optimizers. The results expose the supremacy of the hybrid methods in achieving the higher classification accuracy compared to the originals. Other conclusions are similar to the discussion held on Table 8.

Table 9

Classification accuracy obtained by HCBP, HBBP, BES and PSO

Data set	HCBP		HBBP		BES		BPSO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	0.9786	0	0.9786	0	0.9786	0	0.9786	0
BreastEW	0.9614	0	0.9614	0	0.9614	0	0.9611	0.0007
Congress	0.9795	0.0027	0.9795	0.0025	0.9794	0.0029	0.9788	0.0021
Exactly	1	0	1	0	1	0	1	0
Exactly-II	0.7888	0.0009	0.7884	0.0033	0.7879	0.0041	0.7890	0
HeartEW	0.8531	0.0012	0.8529	0.0017	0.8531	0.0012	0.8529	0.0014
Heart-StatLog	0.8593	0	0.8593	0	0.8593	0	0.8591	0.0007
Hepatitis	0.7387	0.0041	0.7371	0.0054	0.7380	0.0047	0.7296	0.0057
Hillvalley	0.6070	0.0060	0.6069	0.0044	0.6081	0.0055	0.6002	0.0060
Lung	0.9133	0.0109	0.9113	0.0120	0.905	0.0138	0.8983	0.0108
Lymphography	0.4411	0.0108	0.4376	0.0097	0.4371	0.0078	0.4376	0.0075
M-of-N	1	0	1	0	1	0	0.9999	0.0003
Movementlibras	0.8205	0.0044	0.8219	0.0041	0.8208	0.0038	0.8158	0.0057
Sonar	0.9203	0.0093	0.9190	0.0095	0.9157	0.0099	0.9025	0.0081
WDBC	0.9614	0	0.9614	0	0.9614	0	0.9611	0.0007
Wine	0.9556	0	0.9554	0.0010	0.9548	0.0024	0.9556	0
Zoo	0.9415	0.0066	0.9433	0.0046	0.9433	0.0046	0.9406	0.00461
Overall wins	7.32		3.82		3.32		2.5
Rank (F-test)	1.8824		2.3824		2.5000		3.2353

Table 10

$p$ -values of the Wilcoxon sum-rank test for the classification accuracy obtained by HCBP, HBBP versus the basic optimizers ( $p\leqslant 0.05$ are significant and shown in bold, NaN: Not Applicable)

Data set	HCBP		HBBP
	BES	BPSO	BES	BPSO
Breastcancer	NaN	NaN	NaN	NaN
BreastEW	NaN	2.06E-02	NaN	2.06E-02
Congress	9.36E-01	1.89E-01	9.87E-01	2.02E-01
Exactly	NaN	NaN	NaN	NaN
Exactly-II	5.31E-01	3.17E-01	5.70E-01	3.17E-01
HeartEW	3.17E-01	3.21E-01	1.54E-01	6.88E-01
Heart-StatLog	NaN	3.17E-01	NaN	3.17E-01
Hepatitis	6.60E-01	1.35E-08	3.99E-01	4.00E-06
Hillvalley	4.31E-01	3.40E-05	3.64E-01	5.00E-06
Lung	1.19E-02	8.00E-06	5.77E-02	1.35E-04
Lymphography	2.40E-01	3.04E-01	9.87E-01	9.25E-01
M-of-N	NaN	7.80E-02	NaN	7.80E-02
Movementlibras	4.58E-01	1.76E-04	3.26E-01	1.20E-05
Sonar	4.78E-02	2.33E-09	1.21E-01	2.17E-08
WDBC	NaN	1.05E-02	NaN	1.05E-02
Wine	7.81E-02	NaN	2.97E-01	3.17E-01
Zoo	2.39E-01	2.08E-01	NaN	1.23E-02

The results of average CPU running time obtained by each of the 30 runs of each comparative algorithm are reported in Table 11. HBBP shows dominant advantages over HCBP and BPSO, which is beyond our expectations before the experiments. As a natural side effect, the hybrid algorithms often consume more computational time than the originals because they may introduce various variants to improve the performance of the algorithms. For HCBP, the biggest change to HBBP is that it embeds the chaos initializations into the searching phase and the swooping phase, which will certainly increase the computational time in practice. Meanwhile, the chaos of HCBP will enhance the capabilities of exploring more solutions space during the searching, as a result, the computational time has to increase. On the other hand, the reason that HBBP consumes less running time than BES is due to the direction of the optimal solutions in the searching phase and the swooping phase are oriented, which may lead to the decrement of the computational time when the algorithm is executed. Surprisingly, the overall wins of BES is much greater than BPSO, which indicates that BES may solve the formulated feature selection problem faster than some meta-heuristic methods proposed in early time. Moreover, F-test statistic shows that HBBP performs better than HCBP on the other hand. It is worth noting that the difference of the CPU running time between HCBP and HBBP is not obvious, even less than 1 second on the majority of the data sets. Therefore, although HCBP always consumes more time than HBBP, it is always acceptable in the scenario that the proposed problem asks for a higher classification accuracy.

Table 11

CPU running time (in seconds) obtained by HCBP, HBBP, BES and PSO

Data set	HCBP		HBBP		BES		BPSO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	101.2608	0.5065	100.9140	1.4142	102.3795	0.3189	106.2460	4.3347
BreastEW	93.5707	1.4670	93.7159	1.3615	93.1886	1.7517	160.6842	8.6339
Congress	84.0238	1.6491	82.3157	1.8081	84.3368	1.3214	85.9116	0.4328
Exactly	144.9429	2.6347	138.1031	1.3706	130.4040	0.6608	160.8449	32.6223
Exactly-II	160.9872	28.8482	104.5369	16.3732	103.8305	15.5362	138.4619	1.5233
HeartEW	64.6831	0.5820	63.9890	1.2677	62.0172	0.4340	117.2483	5.8989
Heart-StatLog	65.6303	2.4111	61.9220	0.6557	63.5610	0.8020	69.5893	2.1312
Hepatitis	56.2530	2.6983	52.3806	0.6959	53.9502	0.2703	56.3769	0.7018
Hillvalley	112.4195	5.7780	104.7131	1.6637	107.7605	0.9396	160.7605	29.3465
Lung	94.0488	0.6251	98.7862	14.4405	93.5590	6.5153	128.8806	24.1963
Lymphography	52.0027	0.4803	51.7717	0.3915	55.4776	1.2790	54.9828	0.9030
M-of-N	142.4633	0.5464	139.6063	1.4464	187.5365	27.5968	213.5369	45.8423
Movementlibras	80.5755	1.1349	76.4237	0.6171	83.6223	1.1330	133.8762	27.5205
Sonar	83.0989	2.6769	68.6809	4.6286	66.0179	4.0894	163.3987	14.9518
WDBC	98.7807	2.4342	94.2362	2.1002	94.7865	1.8869	150.0455	15.1401
Wine	57.7458	1.7390	53.1988	0.2864	53.1390	0.6381	75.3421	17.9341
Zoo	50.9054	1.3418	47.6673	0.2743	47.2806	0.7032	86.0175	8.2180
Overall wins	0		9		8		0
Rank (F-test)	2.6471		1.5882		1.8824		3.8824

Moreover, the number of selected features are displayed in Table 12. As can be seen, HBBP shows a competitive results among the comparative algorithms and it achieves the minimum value on 13 data sets (76.47%). As expected, the overall wins of HBBP is ranked first and is much greater than the others (49.94% overall wins). Totally, HBBP combines the advantages of BES and BPSO, which may help it to escape from the local optima more likely compared to the originals. Therefore, HBBP is capable to acquire a lower number of selected features. Intuitively, as we discussed in 3.1, a higher classification accuracy usually corresponds to a larger selected features, thus HCBP is difficult to reach a satisfactory results in terms of the selected features since it has won the best performance in classification accuracy observed from the former experiments. Amazingly, compared to the basic algorithms BES and BPSO, HCBP also shows a slight advantage on the data set named Heart-StatLog, Movementlibras, Sonar and Zoo and ranked second in F-test statistic. In addition to notice, STD results show that HCBP is ranked first and provides best results on 11 data sets (64.71%), while HBBP hits the minimum values on 9 data sets (52.94%), which are far better than BES and BPSO actually. Thus, HCBP performs quite stable among the comparative algorithms.

Table 12

Number of selected features obtained by HCBP, HBBP, BES and PSO

Data set	HCBP		HBBP		BES		BPSO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	6	0	6	0	6	0	6	0
BreastEW	2	0	2	0	2	0	8.2333	1.5466
Congress	6.4667	1.3060	6.3333	1.4223	6.4333	1.5466	6.3667	1.2452
Exactly	6	0	6	0	6	0	6	0
Exactly-II	1.0667	0.3651	1.2333	1.2780	1.4333	1.6543	1	0
HeartEW	5.2	0.7611	5.2	0.7611	5.1	0.5477	4.9	0.3051
Heart-StatLog	3	0	3	0	3.0333	0.1826	3.1667	0.3790
Hepatitis	5.9667	0.5561	5.8667	0.8193	5.8667	0.6288	6.8667	0.7761
Hillvalley	45.2333	5.9694	42.9333	6.0965	44.5667	6.4844	47.3333	6.0306
Lung	148.7333	8.7924	145.9333	9.1799	146.3	11.0396	154.5667	8.9199
Lymphography	7.4	1.5222	7.1	1.3734	7.2667	1.4840	7.6	1.5447
M-of-N	6	0	6	0	6	0	6.1	0.3051
Movementlibras	40.3333	3.1984	39.5667	3.4808	40.3667	4.1146	42.4333	4.7320
Sonar	25.1	4.0288	25.4667	3.3294	25.9667	4.4682	27	3.4341
WDBC	2	0	2	0	2	0	9.5333	1.5916
Wine	5.0333	0.1826	5.0667	0.5208	5	0.6433	5.0667	0.2537
Zoo	7.3333	1.2685	7.3333	0.8023	7.6	1.2758	7.9333	1.2576
Overall wins	3.49		8.49		2.49		2.5
Rank (F-test)	2.3824		1.9118		2.3824		3.3235

Table 13 exhibits the overall performance of the proposed two hybrid models against the basic optimizers. Note that the results of CPU running time are disregarded since the scale of the values for computational time and other evaluation metrics are quite different. In other words, the discrepancy of CPU running time is much larger than others, thus taking CPU running time into account may lead to some inaccurate conclusions. Besides, the computational time can be ignored because many practical optimized problems tend to pursue a higher classification accuracy in most cases. Conclusively, it can be seen from the comparative results listed in Table 13, HCBP achieves the best performance in terms of the F-test statistic results. Therefore, we choose HCBP to conduct the following experiments.

Table 13

Overall rank by F-test for HCBP, HBBP, BES and BPSO based on fitness, accuracy and selected features

Evaluation	HCBP	HBBP	BES	BPSO
Fitness	1.8824	2.1471	2.5588	3.4118
Accuracy	1.8824	2.3824	2.5000	3.2353
Features	2.3824	19.9118	2.3824	3.3235
Average rank	2.0431	2.1471	2.4804	3.3235
Final rank	1	2	3	4

5.3.2 The results of HCBP, BGWOPSO, BBA, BDA and BGWO

In this section, the results are compared among HCBP and several algorithms in the literature on the same data sets to further verify the performance of the proposed method. The results are listed in the followings.

The fitness function values obtained by HCBP, BGWOPSO, BBA, BDA and BGWO are described in Table 14. Focused on the overall wins, HCBP shows an overwhelming advantage over the comparative algorithms on most of the data sets, thereby the overall wins of HCBP is much greater than the others. BBA is ranked second with 3.5 overall wins. Other algorithms can solely reach to optimal on one data set or even zero, which leads to the lowest overall wins among the comparative methods. As for the standard deviation values, HCBP hits the minimum values on 13 data sets (76.47%), followed by BBA on 6 data sets (35.29%), as well as BGWOPSO outperforms others on Sonar and BGWO does not achieve any optimal at all. In addition, the F-test statistic results further confirm the superiority of HCBP over other competitors.

The comparison of the significance of the fitness values between HCBP and other optimizers are represented in Table 15. It can be observed from the results that HCBP performs far better than the algorithms from the literature. More specifically, BGWOPSO, BBA, BDA and BGWO are respectively dominated by HCBP on 100%, 47.06%, 82.35% and 88.24% of meaningful data sets. In addition, HCBP does not outperform HBBP on any data set, which indicates that the two hybrid methods are equivalent to each other measured by the metric of Wilcoxon sum-rank test.

Table 14
Fitness function values obtained by HCBP, BGWOPSO, BBA, BDA and BGWO

Data set	HCBP		BGWOPSO		BBA		BDA		BGWO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	0.0272	0	0.0305	0.0022	0.0272	0	0.0294	0.0024	0.0295	0.0028
BreastEW	0.0389	0	0.0435	0.0021	0.0401	0.0005	0.0418	0.0030	0.0501	0.0043
Congress	0.0243	0.0025	0.032	0.0030	0.0258	0.0027	0.0290	0.0033	0.0312	0.0032
Exactly	0.0046	0	0.1903	0.1060	0.0046	0	0.0485	0.1066	0.1175	0.0957
Exactly-II	0.2099	0.0012	0.2124	0.0048	0.2139	0.0080	0.2150	0.0084	0.2315	0.0087
HeartEW	0.1493	0.0016	0.1623	0.0064	0.1499	0.0023	0.1629	0.0104	0.1687	0.0107
Heart-StatLog	0.1416	0	0.1572	0.0161	0.1418	0.0003	0.1573	0.0230	0.1623	0.0242
Hepatitis	0.2619	0.0042	0.2950	0.0136	0.2698	0.0070	0.2872	0.0150	0.3063	0.0171
Hillvalley	0.3936	0.0062	0.4066	0.0061	0.3962	0.0049	0.3915	0.0132	0.3964	0.0067
Lung	0.0904	0.0107	0.1303	0.0134	0.1032	0.0109	0.1065	0.0218	0.1206	0.0193
Lymphography	0.5574	0.0107	0.5794	0.0109	0.5598	0.0102	0.5750	0.0184	0.5848	0.0172
M-of-N	0.0046	0	0.0892	0.0583	0.0046	0	0.0154	0.0387	0.0247	0.0203
Movementlibras	0.1822	0.0044	0.1968	0.0045	0.1820	0.0052	0.1850	0.0080	0.1856	0.0065
Sonar	0.0831	0.0091	0.1187	0.0074	0.0946	0.0098	0.1048	0.0178	0.1112	0.0116
WDBC	0.0389	0	0.0441	0.0029	0.0409	0.0009	0.0424	0.0043	0.0531	0.0066
Wine	0.0479	0.0001	0.0554	0.0056	0.0486	0.0021	0.0538	0.0067	0.0620	0.0100
Zoo	0.0625	0.0067	0.0850	0.0110	0.0622	0.0066	0.0845	0.0135	0.0838	0.0087
Overall wins	12.5		0		3.5		1		0
Rank (F-test)	1.2647		4.2941		1.9118		3.1176		4.4118

Table 15

$p-$ values of the Wilcoxon sum-rank test for the fitness values obtained by HCBP versus HBBP, BGWOPSO, BBA, BDA and BGWO ( $p\leqslant 0.05$ are significant and shown in bold, NaN: Not Applicable)

Data set	HBBP	BGWOPSO	BBA	BDA	BGWO
Breastcancer	NaN	4.73E-11	8.10E-02	5.35E-01	1.29E-04
BreastEW	NaN	1.47E-11	NaN	3.16E-07	2.80E-05
Congress	9.94E-01	2.40E-10	1.74E-02	5.83E-07	2.33E-10
Exactly	NaN	4.32E-12	NaN	1.39E-04	5.40E-11
Exactly-II	9.81E-01	1.34E-03	1.02E-02	1.12E-03	2.80E-11
HeartEW	5.57E-01	4.04E-10	9.65E-02	1.65E-08	2.30E-12
Heart-StatLog	NaN	5.86E-10	2.06E-02	2.00E-06	3.93E-12
Hepatitis	3.97E-01	2.15E-10	4.00E-06	3.39E-10	6.68E-12
Hillvalley	9.12E-01	8.86E-09	8.91E-02	2.71E-01	1.07E-01
Lung	8.53E-01	4.83E-10	7.20E-05	1.99E-02	6.78E-09
Lymphography	7.34E-01	5.95E-09	7.77E-01	3.60E-05	5.31E-08
M-of-N	NaN	1.57E-11	NaN	1.30E-03	5.74E-10
Movementlibras	8.22E-02	3.66E-11	7.67E-01	1.39E-01	6.45E-02
Sonar	6.79E-01	2.82E-11	3.40E-05	9.00E-06	1.12E-09
WDBC	NaN	1.12E-12	1.08E-12	4.21E-12	1.12E-12
Wine	8.55E-02	8.01E-12	2.19E-02	4.26E-08	1.81E-12
Zoo	9.87E-01	1.37E-09	5.36E-01	1.25E-08	1.34E-10

Figures 9 and 10 describe the convergence curves obtained by HCBP and the comparative algorithms when they are running. The source data of each curves are also derived from the 15th test as addressed before. From the curves, we can see that HCBP can nearly achieve the optimal convergence rate on the whole data sets except Lymphography. Specifically, HCBP achieves the minimum convergence value on 9 data sets, which are BreastEW, Congress, Hepatitis, Hillvalley, Movementibras, Sonar, WDBC, Wine and Zoo, respectively. Therefore, the hybrid strategy of BES and PSO effectively works and the chaos utilization of the hybrid method can broaden the solution space and direct the population to a promising area, which contributes to leading the population out of the local optima and accelerates HCBP to converge to the optimal solutions rapidly.

Figure 9.

Convergence curves obtained by HCBP, BGWOPSO, BBA, BDA and BGWO in first 9 data sets.

Figure 10.

Convergence curves obtained by HCBP, BGWOPSO, BBA, BDA and BGWO in last 8 data sets.

The average classification accuracy with F-test statistic obtained by HCBP and the comparative algorithms are presented in Table 16. Intuitively, the distribution of the results are similar to Table 14. Specifically, HCBP shows dominant position over others and the overall wins of the average accuracy can account for over 64% among the comparative algorithms (11 wins, 64.71%). It also can be observed that the overall wins of BBA is ranked second with 5 wins, however, BDA only gain the optimal value in dealing with one data set and BGWO does not reach to any optimal at all. Moreover, HCBP varies relatively smaller than others according to the STDs, as well as F-test results illustrate that HCBP performs far better than BGWOPSO, BDA and BGWO.

In order to compare the significance of the classification accuracy obtained by different algorithms, we also conduct Wilcoxon sum-rank test for HCBP against the other competitors. Table 17 support the arguments that HCBP can gain a higher accuracy compared to the original algorithms on the majority of the meaningful data sets. Moreover, there is no obvious difference in the comparison of significance between the two hybrid methods for reaching to the subset with optimal classification accuracy. Therefore, it can be concluded that HBBP can also outperform other comparative algorithms in achieving a higher accuracy.

From the results appeared in Table 14, 16 and 17, we can see that HCBP outperforms the comparative algorithms on the majority of the data sets and quite stable when it is running. These advantages can be reasoned that the hybrid strategy effectively works during the process of finding the optimal solution in discrete space. In addition to notice that all the algorithms are originated to solve the continuous optimization problems and they all utilize the transfer function to convert the continuous solutions to binary ones, which may also indicate that the proposed binary strategy is beneficial for the hybrid algorithm and can promote the effectiveness of the algorithm. On the other side, BES performs better in tackling some optimization problems compared to some existed meta-heuristic algorithms in the literature, such as GWO [81] and PSO [82]. Thus some rational improvements of BES may facilitate the designed formulated problem be solved more efficiently. To be specific, HCBP broadens the solution space, which may maintain the diversity of the population of candidate solutions, thus balancing the capabilities of exploration and exploitation. The main limitation of other algorithms is the so-called "stagnation", namely, these methods may be easily falling in to the local optima and difficult to escape from it, however, HCBP is capable to mitigate such dilemma more validly. Additionally, the chaos utilization of HCBP provides the possibility to overcome the above restriction and the transfer strategy is suitable for the proposed hybrid method, therefore the indicators such as fitness value and classification accuracy of HCBP are much better than the other comparative algorithms.

Table 16

Classification accuracy obtained by HCBP, BGWOPSO, BBA, BDA and BGWO

Data set	HCBP		BGWOPSO		BBA		BDA		BGWO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	0.9786	0	0.9749	0.0023	0.9786	0	0.9759	0.0030	0.9767	0.0026
BreastEW	0.9614	0	0.9596	0.0016	0.9614	0	0.9596	0.0029	0.9540	0.0046
Congress	0.9795	0.0027	0.9720	0.0032	0.9783	0.0026	0.9747	0.0033	0.9739	0.0032
Exactly	1	0	0.8145	0.1067	1	0	0.9556	0.1084	0.8879	0.0954
Exactly-II	0.7888	0.0009	0.7868	0.0040	0.7856	0.0064	0.7845	0.0072	0.7716	0.0071
HeartEW	0.8531	0.0012	0.84	0.0065	0.8526	0.0019	0.8396	0.0109	0.8346	0.0114
Heart-StatLog	0.8593	0	0.8448	0.0163	0.8593	0	0.8442	0.0228	0.8414	0.0239
Hepatitis	0.7387	0.0041	0.7058	0.0136	0.7309	0.0069	0.7131	0.0155	0.6956	0.0179
Hillvalley	0.6070	0.0060	0.5931	0.0062	0.6045	0.0049	0.6077	0.0127	0.6067	0.0065
Lung	0.9133	0.0109	0.8725	0.0137	0.9004	0.0111	0.8954	0.0216	0.8846	0.0196
Lymphography	0.4411	0.0108	0.4189	0.0114	0.4389	0.0102	0.4231	0.0190	0.4151	0.0173
M-of-N	1	0	0.9164	0.0587	1	0	0.9893	0.0390	0.9808	0.0199
Movementlibras	0.8205	0.0044	0.8053	0.0048	0.8207	0.0051	0.8169	0.0079	0.8196	0.0064
Sonar	0.9203	0.0093	0.8841	0.0075	0.9090	0.0097	0.8976	0.0179	0.8943	0.0117
WDBC	0.9614	0	0.9589	0.0026	0.9609	0.0008	0.9588	0.0040	0.9509	0.0065
Wine	0.9556	0	0.9478	0.0058	0.9548	0.0024	0.9494	0.0070	0.9430	0.0097
Zoo	0.9415	0.0066	0.9191	0.0113	0.9418	0.0066	0.9194	0.0139	0.9218	0.0091
Overall wins	11		0		5		1		0
Rank (F-test)	1.3529		4.2049		1.8824		3.3824		4.1765

Table 17

$p$ -values of the Wilcoxon sum-rank test for the classification accuracy obtained by HCBP versus HBBP, BGWOPSO, BBA, BDA and BGWO ( $p\leqslant 0.05$ are significant and shown in bold, NaN: Not Applicable)

data set	HBBP	BGWOPSO	BBA	BDA	BGWO
Breastcancer	NaN	3.96E-11	7.13E-02	2.44E-01	1.32E-02
BreastEW	NaN	1.31E-11	NaN	3.13E-07	2.70E-05
Congress	9.74E-01	3.90E-10	7.33E-02	1.00E-06	1.04E-08
Exactly	NaN	4.31E-12	NaN	1.39E-04	5.40E-11
Exactly-II	9.81E-01	1.01E-02	1.93E-02	2.24E-03	8.65E-11
HeartEW	5.57E-01	2.95E-10	4.96E-02	9.65E-02	2.06E-12
Heart-StatLog	NaN	5.33E-09	NaN	1.20E-05	1.09E-11
Hepatitis	3.71E-01	2.66E-11	3.00E-06	2.37E-10	4.03E-12
Hillvalley	9.58E-01	3.49E-09	1.03E-01	5.19E-01	8.99E-01
Lung	5.97E-01	9.58E-11	8.10E-05	3.72E-04	4.54E-08
Lymphography	2.34E-01	1.14E-08	6.70E-01	5.10E-05	2.32E-07
M-of-N	NaN	1.57E-11	NaN	1.30E-03	5.74E-10
Movementlibras	9.37E-02	3.68E-11	8.38E-01	7.33E-02	7.64E-01
Sonar	7.08E-01	1.99E-11	9.20E-05	2.00E-06	1.83E-09
WDBC	NaN	1.29E-08	2.59E-03	1.20E-05	6.34E-13
Wine	3.17E-01	4.71E-09	7.81E-02	2.00E-06	1.51E-10
Zoo	2.39E-01	1.41E-09	8.00E-01	3.22E-08	7.54E-10

The results of average computational time in seconds for each algorithm over 30 runs are exhibited in Table 18. According to the results, BGWOPSO consumes the least average CPU running time on 12 data sets (the number of wins is 12) and followed by HCBP with 3 overall wins. On STDs, HCBP shows a leading result on 7 data sets and BGWOPSO can reach to the minimum value on 8 data sets. Although HCBP is not the fastest algorithm in dealing with the such designed feature selection problem, we can see that the results are quite close to BGWOPSO on most data sets and shows a superiority over BBA, BDA and BGWO a lot. Consequently, HCBP is ranked second in F-test statistic and the overhead of HCBP is satisfactory since the introduced improved factors consume less extra computational time than we expected.

Additionally, the average number of selected features obtained by HCBP and other comparative algorithms are listed in Table 19. Surprisingly, HCBP tends to select fewer features compared to others as it achieves best results on 10 data sets and shares first rank with BDA. Moreover, HCBP hits the minimum values on 13 data sets in terms of the STD results, which is far better than BGWOPSO, BBA, BDA and BWGO. The F-test statistic results of average selected features covers that HCBP is ranked behind BDA but better than BGWOPSO, BBA and BGWO. By comprehensive considerations, HCBP can achieve the best average classification accuracy while remaining a relatively fewer number of selected features among the comparative algorithms, thus it is able to apply to most applications to tackle the designed formulated problem well.

Table 18

CPU running time (in seconds) obtained by HCBP, BGWOPSO, BBA, BDA and BGWO

Data set	HCBP		BGWOPSO		BBA		BDA		BGWO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	101.26	0.51	90.58	5.28	110.93	3.85	112.14	3.20	109.95	2.58
BreastEW	93.57	1.47	97.78	1.22	113.65	6.51	103.91	7.68	146.99	73.29
Congress	84.02	1.65	80.79	1.570	87.02	3.10	84.41	3.72	118.56	10.13
Exactly	144.94	2.63	114.487	7.90	148.94	4.68	139.98	5.48	165.87	11.02
Exactly-II	160.99	28.85	129.46	4.55	136.491	8.58	126.66	10.76	152.26	15.82
HeartEW	64.68	0.58	76.83	15.75	69.99	1.24	68.38	1.64	69.61	1.91
Heart-StatLog	65.63	2.41	62.68	0.77	72.74	21.26	70.71	3.48	70.27	1.60
Hepatitis	56.25	2.70	50.12	0.48	53.98	0.45	55.76	5.12	54.79	0.75
Hillvalley	112.42	5.77	90.70	1.08	97.97	2.30	191.36	213.34	105.99	11.38
Lung	94.05	0.63	50.29	0.83	53.76	2.72	52.96	1.75	201.29	92.80
Lymphography	52.00	0.48	49.78	0.27	53.75	0.85	52.34	0.70	54.26	1.27
M-of-N	142.46	0.55	170.57	30.31	145.45	4.60	138.87	5.89	154.76	7.37
Movementlibras	80.58	1.13	72.15	8.07	110.84	5.94	80.83	25.83	346.96	28.51
Sonar	83.10	2.68	59.52	7.82	209.40	91.99	139.90	31.23	229.36	91.06
WDBC	98.78	2.43	99.61	1.36	131.24	13.97	141.62	43.90	171.72	57.00
Wine	57.75	1.74	51.72	0.45	58.25	0.67	58.99	1.08	73.98	5.60
Zoo	50.91	1.34	44.91	0.45	51.61	0.35	52.79	0.86	60.73	3.36
Overall wins	3		12		0		2		0
Rank (F-test)	2.471		1.647		3.471		3.118		4.294

Table 19

Number of selected features obtained by HCBP, BGWOPSO, BBA, BDA and BGWO

Data set	HCBP		BGWOPSO		BBA		BDA		BGWO
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Breastcancer	6	0	5.6	1.13	6	0	5.57	0.86	6.37	0.67
BreastEW	2	0	10.4	2.84	5.73	1.62	5.27	1.89	13.57	2.56
Congress	6.47	1.31	6.8	1.73	6.97	1.45	6.33	2.22	8.5	1.11
Exactly	6	0	8.57	1.50	6	0	6	1.46	8.5	1.76
Exactly-II	1.07	0.37	1.8	1.32	2.13	2.27	2.13	2.08	7	2.55
HeartEW	5.2	0.76	5.07	1.175	5.1	0.84	5.23	1.33	6.33	1.42
Heart-StatLog	3	0	4.7	1.26	3.17	0.38	3.97	1.25	6.83	1.29
Hepatitis	5.97	0.56	7	2.48	6.5	1.17	6.07	1.68	9.3	1.66
Hillvalley	45.23	5.97	37.83	5.84	46.27	3.41	30.57	12.39	70.73	4.17
Lung	148.73	8.79	131.17	18.01	150.17	10.65	96.03	32.17	207.53	10.64
Lymphography	7.4	1.52	7.33	1.86	7.67	1.69	7.03	2.17	10.3	1.62
M-of-N	6	0	8.4	1.4994	6	0	6.2667	0.45	7.4	0.86
Movementlibras	40.33	3.20	36.4	6.33	40.6	3.94	32.8	7.91	63.73	3.50
Sonar	25.1	4.03	23.8	5.01	27.17	3.96	20.5	5.58	39.23	3.14
WDBC	2	0	10.4	2.46	6.8	1.67	5.13	2.29	13.97	2.39
Wine	5.03	0.18	4.87	1.33	5.03	0.67	4.83	1.26	7.23	0.97
Zoo	7.33	1.27	7.87	1.55	7.43	1.01	7.5	1.91	10.23	1.38
Overall wins	6.83		1		0.83		8.33		0
Rank (F-test)	2.147		2.941		3.118		1.912		4.882

Moreover, the overall comparisons between HCBP and the other optimizers are exposed in Table 20. As can been seen, the final rank of HCBP is better than the comparative algorithms, which indicates that the quality of the best solution returned by HCBP is beyond the other peer. The main reason for the superiority of HCBP is because the improved factors in our proposed method such as hybridization of the originals, chaos utilization and customized binarization optimization can alleviate the stagnation problems effectively. In each iteration, the fitness function of every individual of the population is calculated by BES and the position of the candidate solution is updated by PSO. Since BES and PSO have been proved to be an outstanding meta-heuristic algorithm in the literature, the efficient hybridization of them may further improve the performance of the solving process of the designed feature selection problem theoretically. The chaos initialization in HCBP may maintain the diversity of the population, which enables the population in each generation to seek the optimal solution in a more promising space. The binary strategy helps HCBP to transfer the solution to binary form more impactfully compared to other optimizers. To sum up, the proposed hybrid method helps to discover better solution, meanwhile it helps the algorithm to avoid trapping in local optima. Therefore, HCBP is capable to achieve a higher classification accuracy, a better fitness function value, a rational computational time and a proper number of selected features.

Table 20

Overall rank by F-test for HCBP, BGWOPSO, BBA, BDA and BGWO based on fitness, accuracy and selected features

Evaluation	HCBP	BGWPSO	BBA	BDA	BGWO
Fitness	1.2647	4.2941	1.9118	3.1176	4.4118
Accuracy	1.3529	4.2049	1.8824	3.3824	4.1765
Features	2.1471	2.9412	3.1176	1.9118	4.8824
Average rank	1.5882	3.8134	2.3039	2.8039	4.4902
Final rank	1	4	2	3	5

Table 21

Average classification accuracy obtained by HCBP, HBBP and other filter-based algorithms

	HCBP	HBBP	CFS	FCBF	F-Score	IG	Spectrum
Breastcancer	0.979	0.979	0.957	0.986	0.979	0.957	0.957
BreastEW	0.961	0.961	0.825	0.798	0.930	0.930	0.772
Congeress	0.980	0.980	0.793	0.793	0.908	0.828	0.828
Exactly	1	1	0.670	0.440	0.600	0.615	0.575
Exactly-II	0.789	0.788	0.705	0.545	0.680	0.620	0.660
HeartEW	0.853	0.853	0.648	0.648	0.759	0.759	0.796
Lymphography	0.441	0.438	0.500	0.567	0.667	0.667	0.767
M-of-N	1	1	0.785	0.815	0.815	0.815	0.580
Sonar	0.920	0.919	0.310	0.214	0.048	0.191	0.048
Zoo	0.942	0.943	0.800	0.900	0.650	0.850	0.600

5.4 Comparison with filter-based feature selection algorithms from the literature

In this section, the classification accuracy of the proposed hybrid wrapper-based algorithm (HCBP) is compared with several well-established filter-based algorithms that are Correlation feature selection (CFS), fast correlation-based filter (FCBF), F-score, IG and Spectrum, respectively. The results of classification accuracy obtained by the above filter-based algorithms are reported in [83]. In addition to notice that the $K$ NN classifier ( $k=$ 5) are utilized in conjunction with the filter based algorithms in [83] for feature selection, wherein the traning data sets are partitioned into 80% for learning and the rest for validation. Table 21 represents the comparative results of the average classification accuracy on the same data sets.

From the results on the same 10 data sets reported from the literature, we can see that HCBP achieves best accuracy on 7 data sets (70.00%) and HBBP hits the maximum on 5 data sets (50.00%), which both take substantial advantages over the competitors. In addition, FCBF and Spectrum respectively reach to the unique optimal on the data set named Breastcancer and Lymphography, However, CFS, F-Score and IG do not achieve the optimal accuracy on any data set.

The main reason for the good performance of HCBP and HBBP on the majority of the data sets is the hybridization of the advantage of BES and PSO. On the other side, HCBP shows dominant position over HBBP as shown in the previous experiments is because that the chaos initialization may improve the quality of the population and contribute to finding more solutions with high classification accuracy. Furthermore, HCBP and HBBP are capable to map the solution from the continuous to binary values by utilizing the proposed optimized transfer function. It is noted that although the hybrid algorithm shows great superiority than the comparative filter-based algorithms on the tested data sets in tackling the designed formulated problem, this does not mean that the proposed algorithm is the best choice than the filter-based methods to deal with all feature selection optimization problems, particularly considering that the filter approach is much faster and more scalable than the wrapper approach.

6. Conclusion and future work

In this paper, we develop a hybrid population-based evolutionary algorithm that embed PSO into BES to alleviate its drawbacks and jointly tackle the designed formulated single-objective feature selection problem. Firstly, we introduce the main procedures of the original algorithms, i.e., BES and PSO, followed by the analysis of their advantages and disadvantages. Secondly, we propose two hybrid algorithms that are HBBP and HCBP to efficiently overcome the shortcomings of conventional BES. In addition, based on HBBP, HCBP adopts a new chaotic map called $L S$ to broaden the solution space and maintain the diversity of the population. Finally, the bianarization strategy is introduced into HBBP and HCBP to convert the solutions from continuous space into binary at each generation. In order to test the effectiveness of the proposed hybrid methods, the experiments are conducted on 17 well-known UCI data sets, and the results are compared in two phases. In the first phase, HBBP and HCBP are compared with the original optimizers (i.e., BES and BPSO). From the results, we conclude that: (i) BES is more capable to solve the formulated problem well than BPSO. (ii) HBBP appears to be better than BES and BPSO to some extents on the majority of the data sets in terms of the all indicators (i.e., fitness value, classification accuracy, the number of selected features and CPU running time). (iii) HCBP outperforms HBBP by comprehensive comparison ranking (i.e., F-test). In the second phase, the hybrid method with better performance (i.e., HCBP) is compared to four meta-heuristic algorithms, namely, BGWOPSO, BBA, BDA and BGWO. The experimental results show that HCBP takes substantial advantages over these algorithms on the majority of the data sets, which also confirms the evidence that the binarization strategy is suitable for the proposed hybrid methods. Finally, a comparison between the proposed hybrid methods and five filter-based algorithms in the literature is presented. The results also support the superiority of HCBP over all competitors.

In the future, we plan to investigate the efficiency of HCBP and HBBP in other research fields such as engineering applications [84], networks applications [85], parameters control applications [86] and so on. Moreover, introducing more powerful operators such as opposite-based learning [87], Lévy flight [88] to improve the performance of BES to solve the feature selection problem can also be addressed in future work.

Footnotes

Acknowledgments

This study is supported in part by the National Natural Science Foundation of China (62172186, 62002133, 61872158, 62272194), in part by the Science and Technology Development Plan Project of Jilin Province (20210101183JC, 20210201072GX), and in part by the Young Science and Technology Talent Lift Project of Jilin Province (QT202013).

List of symbols and abbreviations

Table A-1

The meaning of symbols in this paper

Parameter(s)	The Corresponding Meaning
α	The weights of classification accuracy in Eq. (1)
β	The weights of feature reduction in Eq. (1)
γ ⁢ ( S )	The classification accuracy of the subset S for a certain classifier in Eq. (1)
Cn, Tn	The number of selected features in the subset and the number of total features in the data set in Eq. (1)
X	The position of search agent
t	The number of iteration
ν	The control weight related to the position in Eq. (2)
𝑟𝑎𝑛𝑑 ⁢ ( 0 , 1 )	Generating a random number varies in [0, 1]
X 𝑏𝑒𝑠𝑡	The current optimal position from the past
X 𝑚𝑒𝑎𝑛 t	The mean position considering all useful information from the previous points
X i t	The i th eagle’s position at t th iteration
Δ ⁢ Y	The step factor in the searching phase
p ⁢ ( i ) , q ⁢ ( i )	The inertia coefficients that control the position of eagle in polar coordinates
θ ⁢ ( i ) , r ⁢ ( i )	The polar angle and polar diameter of the spiral equation
d	Estimating the corner position during the point search around the center point in Eqs. (7), (8), (10)
G	The number of search cycle in Eq. (10)
Δ ⁢ Z	The step factor in the swooping phase in Eqs. (11), (12)
m 1 , m 2	The control weights that increase the movement intensity of the eagle to the best point and center point in Eq. (12)
p ( i ) ′ , q ( i ) ′	The position of eagle in polar coordinates
N	The population size in Algorithms 3.2, 3.3, 4.4.3
X i ( i = 1 , 2 ⁢ … , N )	The population in Algorithms 3.2, 3.3, 4.4.3
t 𝑚𝑎𝑥	The maximum iteration in Algorithms 3.2, 3.3, 4.4.3
X 𝑝𝑏𝑒𝑠𝑡 t	The personal optimal solution at the current iteration in Eq. (18)
X 𝑔𝑏𝑒𝑠𝑡 t	The global optimal solution during the searching process in Eq. (18)
ω	The inertia coefficient that controls the local search or global search and varies from (0, 1) in Eq. (18)
c 1 , c 2	The cognition factor and social factor in Eq. (18)
h	The index of the chaotic sequence in Eq. (21)
ω ⁢ [ h ]	The h th element in the chaotic map in Eq. (21)
λ	A multiplier of chaos that varies in [0,4] in Eq. (21)
mod	The modulo operation in Eq. (21)
Δ ⁢ X t	The transfer factor in the proposed hybrid algorithm in Eq. (24)
γ j t	The j th dimension in the solution obtained in selecting phase at t th iteration in Eq. (27)
X t	The population from different hybrid algorithms in previous iterations in Algorithm 4.4.3
N 𝑑𝑖𝑚	The dimension size of the solution in Algorithm 4.4.3

Table A-2

The expansions of abbreviations in this paper

Abbreviations	Expansions
HBBP	Hybrid basic bald eagle search-particle swarm optimization
HCBP	Hybrid chaos-based bald eagle search-particle swarm optimization
BES	Bald Eagle Search
PSO	Particle Swarm Optimization
IG	Information Gain
TRC	Trace ratio criterion
UCI	UC Irvine
BGWO	binary gray wolf optimization
BDA	Binary dragonfly algorithm
UACO	Unsupervised ant colony optimization
BBA	Binary bat algorithm
BFA	Binary firefly algorithm
WOA	Whale optimization algorithm
ACO	Ant colony optimization
IBGAFG	Improved binary genetic algorithm with feature granulation
INRSG	Improved Neighborhood Rough set with sample granulation
ROGA	λ optimization based on genetic algorithm
BSF	Binary sailfish
IBDA	Improved binary dragonfly algorithm
EPD	Evolutionary population dynamics
CSA	Crow search algorithm
PBES	Polar-coordinate BES (PBES)
GA	Genetic algorithm
K NN	K -nearest neighbor
BGWOPSO	Binary version of the hybrid GWO and PSO
NFL	No free lunch
BPSO	Bniary Particle Swarm Optimization
L ⁢ S	Logist-sine map
AVG	Average value
STD	Standard deviation
F-test	Friedman test
CFS	Correlation feature selection
FCBF	Fast correlation-based filter

References

Huang

Luo

Fujita

Horng

S.-J.

, Dynamic variable precision rough set approach for probabilistic set-valued information systems, Knowledge-based systems 122 (2017), 131–147.

Jiao

Shang

Wang

Liu

, Fast semi-supervised clustering with enhanced spectral embedding, Pattern Recognition 45(12) (2012), 4358–4369.

Rong

Gong

Gao

, Feature selection and its use in big data: challenges, methods, and trends, IEEE Access 7 (2019), 19709–19725.

Liu

Wang

, A brief survey on nature-inspired metaheuristics for feature selection in classification in this decade, in: 2019 IEEE 16th international conference on networking, sensing and control (ICNSC), IEEE, 2019, pp. 424–429.

Bolón-Canedo

Alonso-Betanzos

Morán-Fernández

Cancela

, Feature Selection: From the Past to the Future, 2022, pp. 11–34. ISBN 978-3-030-93051-6. doi: 10.1007/978-3-030-93052-3_2.

Tsai

C.-F.

Sung

Y.-T.

, Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches, Knowledge-Based Systems 203 (2020), 106097.

Guyon

Elisseeff

, An introduction to variable and feature selection, Journal of Machine Learning Research 3(Mar) (2003), 1157–1182.

Wang

Chen

Alterovitz

, Accelerating wrapper-based feature selection with K-nearest-neighbor, Knowledge-Based Systems 83(Jul.) (2015), 81–91.

Nie

Xiang

Jia

Zhang

Yan

, Trace ratio criterion for feature selection., in: AAAI, Vol. 2, 2008, pp. 671–676.

10.

, Feature Selection Under Orthogonal Regression with Redundancy Minimizing, in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 3457–3461.

11.

Tang

Alelyani

Liu

, Feature selection for classification: A review, Data classification: Algorithms and applications (2014), 37.

12.

Edelkamp

Schrodl

, Heuristic search: theory and applications, Elsevier, 2011.

13.

Talbi

E.-G.

, Metaheuristics: from design to implementation, John Wiley & Sons, 2009.

14.

Alsattar

Zaidan

, Novel meta-heuristic bald eagle search optimisation algorithm, Artificial Intelligence Review 53(3) (2020), 2237–2264.

15.

Kapileswar

Phani Kumar

, Energy efficient routing in IOT based UWSN using bald eagle search algorithm, Transactions on Emerging Telecommunications Technologies 33(1) (2022), e4399.

16.

Nicaire

N.F.

Steve

P.N.

Salome

N.E.

Grégroire

A.O.

, Parameter estimation of the photovoltaic system using Bald Eagle Search (BES) algorithm, International Journal of Photoenergy 2021 (2021).

17.

Thaher

Chantar

Too

Mafarja

Turabieh

Houssein

E.H.

, Boolean Particle Swarm Optimization with various Evolutionary Population Dynamics approaches for feature selection problems, Expert Systems with Applications 195 (2022), 116550.

18.

Kennedy

Eberhart

, Particle swarm optimization, in: Proceedings of ICNN’95-international conference on neural networks, Vol. 4, IEEE, 1995, pp. 1942–1948.

19.

Houssein

E.H.

Ewees

A.A.

ElAziz

M.A.

, Improving twin support vector machine based on hybrid swarm optimizer for heartbeat classification, Pattern Recognition and Image Analysis 28(2) (2018), 243–253.

20.

Khare

Rangnekar

, A review of particle swarm optimization and its applications in solar photovoltaic system, Applied Soft Computing 13(5) (2013), 2997–3006.

21.

Raidl

G.R.

, A unified view on hybrid metaheuristics, in: International workshop on hybrid metaheuristics, Springer, 2006, pp. 1–12.

22.

Blum

Puchinger

Raidl

G.R.

Roli

, Hybrid metaheuristics in combinatorial optimization: A survey, Applied soft computing 11(6) (2011), 4135–4151.

23.

Kohavi

John

G.H.

, The wrapper approach, in: Feature extraction, construction and selection, Springer, 1998, pp. 33–50.

24.

Ghosh

Guha

Sarkar

Abraham

, A wrapper-filter feature selection technique based on ant colony optimization, Neural Computing and Applications 32(12) (2020), 7839–7857.

25.

Sharkawy

Ibrahim

Salama

Bartnikas

, Particle swarm optimization feature selection for the classification of conducting particles in transformer oil, IEEE Transactions on Dielectrics and Electrical Insulation 18(6) (2011), 1897–1907.

26.

Sakri

S.B.

Rashid

N.B.A.

Zain

Z.M.

, Particle swarm optimization feature selection for breast cancer recurrence prediction, IEEE Access 6 (2018), 29637–29647.

27.

Inbarani

H.H.

Azar

A.T.

Jothi

, Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis, Computer methods and programs in biomedicine 113(1) (2014), 175–185.

28.

Chuang

L.-Y.

Yang

C.-H.

J.-C.

, Chaotic maps based on binary particle swarm optimization for feature selection, Applied Soft Computing 11(1) (2011), 239–248.

29.

Shami

T.M.

El-Saleh

A.A.

Alswaitti

Al-Tashi

Summakieh

M.A.

Mirjalili

, Particle swarm optimization: A comprehensive survey, IEEE Access (2022).

30.

Emary

Zawbaa

H.M.

Hassanien

A.E.

, Binary Grey Wolf Optimization Approaches for Feature Selection, Neurocomputing 172 (2016), 371–381.

31.

Rahman

C.M.

Rashid

T.A.

Alsadoon

Bacanin

Fattah

Mirjalili

, A survey on dragonfly algorithm and its applications in engineering, Evolutionary Intelligence (2021), 1–21.

32.

Dadaneh

B.Z.

Markid

H.Y.

Zakerolhosseini

, Unsupervised probabilistic feature selection using ant colony optimization, Expert Systems with Applications 53 (2016), 27–42.

33.

Huda

R.K.

Banka

, Efficient Feature Selection and Classification Algorithm Based on PSO and Rough Sets, Neural Computing and Applications 31(8) (2019), 4287–4303.

34.

Zhang

Song

X.-F.

Gong

D.-W.

, A return-cost-based binary firefly algorithm for feature selection, Information Sciences 418 (2017), 561–574.

35.

Mafarja

Mirjalili

, Whale optimization approaches for wrapper feature selection, Applied Soft Computing 62 (2018), 441–453.

36.

Peng

Ying

Tan

Sun

, An improved feature selection algorithm based on ant colony optimization, IEEE Access 6 (2018), 69203–69209.

37.

Dong

Ding

Sun

, A novel hybrid genetic algorithm with granular information for feature selection and optimization, Applied Soft Computing 65 (2018), 33–46.

38.

Ghosh

K.K.

Ahmed

Singh

P.K.

Geem

Z.W.

Sarkar

, Improved binary sailfish optimizer based on adaptive β-hill climbing for feature selection, IEEE Access 8 (2020), 83548–83560.

39.

Kang

Sun

Feng

, IBDA: Improved binary dragonfly algorithm with evolutionary population dynamics and adaptive crossover for feature selection, IEEE Access PP(99) (2020), 1–1.

40.

Sayed

G.I.

Hassanien

A.E.

Azar

A.T.

, Feature selection via a novel chaotic crow search algorithm, Neural computing and applications 31(1) (2019), 171–188.

41.

Bharanidharan

Sannasi Chakravarthy

Rajaguru

, Improved Bald Eagle Search Optimization for Enhancing the Performance of Supervised Classifiers in Dementia Diagnosis, in: Kuala Lumpur International Conference on Biomedical Engineering, Springer, 2022, pp. 59–66.

42.

Sayed

G.I.

Soliman

M.M.

Hassanien

A.E.

, A novel melanoma prediction model for imbalanced data using optimized SqueezeNet by bald eagle search optimization, Computers in Biology and Medicine 136 (2021), 104712.

43.

Zhang

Zhou

Luo

Zhu

, A Curve Approximation Approach Using Bio-inspired Polar Coordinate Bald Eagle Search Algorithm, International Journal of Computational Intelligence Systems 15(1) (2022), 1–25.

44.

Talbi

E.-G.

, A taxonomy of hybrid metaheuristics, Journal of heuristics 8(5) (2002), 541–564.

45.

I.-S.

Lee

J.-S.

Moon

B.-R.

, Hybrid genetic algorithms for feature selection, IEEE Transactions on pattern analysis and machine intelligence 26(11) (2004), 1424–1437.

46.

Nemati

Basiri

M.E.

Ghasem-Aghaee

Aghdam

M.H.

, A novel ACO–GA hybrid algorithm for feature selection in protein function prediction, Expert Systems with Applications 36(10) (2009), 12086–12094.

47.

Zheng

Wang

Chen

Fan

Cui

, A novel hybrid algorithm for feature selection, Personal and Ubiquitous Computing 22(5) (2018), 971–985.

48.

Mafarja

Qasem

Heidari

A.A.

Aljarah

Faris

Mirjalili

, Efficient Hybrid Nature-Inspired Binary Optimizers for Feature Selection, Cognitive Computation 12(1) (2020), 150–175.

49.

Qasim

O.S.

Mahmoud

M.S.

Hasan

F.M.

, Hybrid binary dragonfly optimization algorithm with statistical dependence for feature selection, International Journal of Mathematical, Engineering and Management Sciences 5(6) (2020), 1420.

50.

Aouari

Mansour

R.F.

, A Hybrid Algorithm Based on PSO and GA for Feature Selection, Journal of Cybersecurity 3(2) (2021), 117.

51.

Chen

Zhou

F.-Y.

Yuan

X.-F.

, Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection, Expert Systems with Applications 128 (2019), 140–156.

52.

Adamu

Abdullahi

Junaidu

S.B.

Hassan

I.H.

, An hybrid particle swarm optimization with crow search algorithm for feature selection, Machine Learning with Applications 6 (2021), 100108.

53.

Tawhid

M.A.

Dsouza

K.B.

, Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems, Applied Computing and Informatics (2018).

54.

Al-Tashi

Kadir

Rais

H.M.

Mirjalili

Alhussian

, Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection, IEEE Access (2019).

55.

Wolpert

D.H.

Macready

W.G.

, No free lunch theorems for optimization, IEEE transactions on evolutionary computation 1(1) (1997), 67–82.

56.

Al-Tashi

Rais

Abdulkadir

S.J.

, Hybrid swarm intelligence algorithms with ensemble machine learning for medical diagnosis, in: 2018 4th International Conference on Computer and Information Sciences (ICCOINS), IEEE, 2018, pp. 1–6.

57.

Bashir

Qamar

Khan

F.H.

, BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting, Australasian Physical & Engineering Sciences in Medicine 38(2) (2015), 305–323.

58.

Arora

Anand

, Binary butterfly optimization approaches for feature selection, Expert Systems with Applications 116 (2019), 147–160.

59.

Aljarah

Al-Zoubi

Faris

Hassonah

M.A.

Mirjalili

Saadeh

, Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm, Cognitive Computation 10(3) (2018), 478–495.

60.

Kang

Sun

Liang

Liu

Zhang

, Physical Layer Secure Communications Based on Collaborative Beamforming for UAV Networks: A Multi-objective Optimization Approach, in: IEEE INFOCOM 2021-IEEE Conference on Computer Communications, IEEE, 2021, pp. 1–10.

61.

Nguyen

B.H.

Xue

Andreae

Zhang

, A new binary particle swarm optimization approach: Momentum and dynamic balance between exploration and exploitation, IEEE transactions on cybernetics 51(2) (2019), 589–603.

62.

Luh

G.-C.

Lin

C.-Y.

Lin

Y.-S.

, A binary particle swarm optimization for continuum structural topology optimization, Applied Soft Computing 11(2) (2011), 2833–2844.

63.

Chauhan

Pant

Deep

, Gompertz PSO variants for Knapsack and Multi-Knapsack problems, Applied Mathematics-A Journal of Chinese Universities 36 (2021), 611–630.

64.

Chhabra

Hussien

A.G.

Hashim

F.A.

, Improved bald eagle search algorithm for global optimization and feature selection, Alexandria Engineering Journal 68 (2023), 141–180.

65.

Xie

Zhu

Wang

Cheng

Sani

A.S.

Yuan

Yang

, A novel directional and non-local-convergent particle swarm optimization based workflow scheduling in cloud–edge environment, Future Generation Computer Systems 97 (2019), 361–378.

66.

Alatas

Akin

Bedri

Ozer , Chaos embedded particle swarm optimization algorithms, Chaos, Solitons & Fractals 40(4) (2009), 1715–1734.

67.

Demir

F.B.

Tuncer

Kocamaz

A.F.

, A chaotic optimization method based on logistic-sine map for numerical function optimization, Neural Computing and Applications 32(17) (2020), 14227–14239.

68.

Mirjalili

Lewis

, S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization, Swarm & Evolutionary Computation 9(Complete) (2013), 1–14.

69.

Guha

Das

Singh

P.K.

Ahmadian

Senu

Sarkar

, Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals, IEEE Access 8 (2020), 182868–182887.

70.

Liu

J.H.

Yang

R.H.

Sun

S.H.

, The analysis of binary particle swarm optimization, Journal of Nanjing University (Natural Sciences) (2011).

71.

Faris

Mafarja

M.M.

Heidari

A.A.

Aljarah

Al-Zoubi

A.M.

Mirjalili

Fujita

, An Efficient Binary Salp Swarm Algorithm with Crossover Scheme for Feature Selection Problems, Knowledge-Based Systems 154 (2018), 43–67.

72.

Bache

Lichman

, UCI Machine Learning Repository (2013).

73.

Tran

Xue

Zhang

, Variable-length particle swarm optimization for feature selection on high-dimensional classification, IEEE Transactions on Evolutionary Computation 23(3) (2018), 473–487.

74.

Huda

R.K.

Banka

, New efficient initialization and updating mechanisms in PSO for feature selection and classification, Neural Computing and Applications 32(8) (2020), 3283–3294.

75.

Abdel-Basset

El-Shahat

Sangaiah

A.K.

, A modified nature inspired meta-heuristic whale optimization algorithm for solving 0–1 knapsack problem, International Journal of Machine Learning and Cybernetics 10 (2019), 495–514.

76.

Xue

Zhang

Browne

W.N.

, A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification, International Journal of Computational Intelligence and Applications 14(2) (2015), 1550008.

77.

Emary

Zawbaa

H.M.

Hassanien

A.E.

, Binary ant lion approaches for feature selection, Neurocomputing 213 (2016), 54–65.

78.

Lin

S.-W.

Ying

K.-C.

Chen

S.-C.

Lee

Z.-J.

, Particle Swarm Optimization for Parameter Determination and Feature Selection of Support Vector Machines, Expert Systems with Applications 35(4) (2008), 1817–1824. doi: 10.1016/j.eswa.2007.08.088.

79.

Mafarja

Aljarah

Heidari

A.A.

Faris

Fournier-Viger

Mirjalili

, Binary Dragonfly Optimization for Feature Selection Using Time-Varying Transfer Functions, Knowledge-Based Systems 161 (2018), 185–204.

80.

Sedgwick

, A comparison of parametric and non-parametric statistical tests, BMJ 350 (2015).

81.

Ramadan

Kamel

Hassan

M.H.

Khurshaid

Rahmann

, An Improved Bald Eagle Search Algorithm for Parameter Estimation of Different Photovoltaic Models, Processes 9(7) (2021), 1127.

82.

Fathy

Ferahtia

Rezk

Yousri

Abdelkareem

M.A.

Olabi

A.G.

, Robust parameter estimation approach of Lithium-ion batteries employing bald eagle search algorithm, International Journal of Energy Research (2022).

83.

Mafarja

Aljarah

Heidari

A.A.

Hammouri

A.I.

Faris

Al-Zoubi

A.M.

Mirjalili

, Evolutionary Population Dynamics and Grasshopper Optimization Approaches for Feature Selection Problems, Knowledge-Based Systems 145 (2018), 25–45.

84.

Shaheen

A.M.

El-Sehiemy

R.A.

, Application of multi-verse optimizer for transmission network expansion planning in power systems, in: 2019 International Conference on Innovative Trends in Computer Engineering (ITCE), IEEE, 2019, pp. 371–376.

85.

Abdel-Basset

Shawky

L.A.

Eldrandaly

, Grid quorum-based spatial coverage for IoT smart agriculture monitoring using enhanced multi-verse optimizer, Neural Computing and Applications 32 (2020), 607–624.

86.

Faris

Hassonah

M.A.

Al-Zoubi

A.M.

Mirjalili

Aljarah

, A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture, Neural Computing and Applications 30 (2018), 2355–2369.

87.

Tizhoosh

H.R.

, Opposition-based learning: a new scheme for machine intelligence, in: International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06), Vol. 1, IEEE, 2005, pp. 695–701.

88.

Kamaruzaman

A.F.

Zain

A.M.

Yusuf

S.M.

Udin

, Levy flight algorithm for optimization problems-a literature review, Applied Mechanics and Materials 421 (2013), 496–501.

Evolutionary feature selection based on hybrid bald eagle search and particle swarm optimization

Abstract

Keywords

1. Introduction

3. Preliminaries

3.1 Fitness function

4.1 Motivation of hybridization

4.2 HBBP

4.2.1 Principle of hybridization

4.2.2 Strategy of hybridization

4.2.3 Solution update method

4.3.1 Chaotic map

4.6 Solution expression

5.1 Benchmark data sets

Table 1 Benchmark data sets

5.2.1 Parameter tuning

Table 2 Parameter tuning results of c 1 and c 2 in the proposed hybrid algorithm

Table 5 Average results of various number of neighbors ( k ) of K NN classifier in the proposed hybrid algorithm

Table 6 Key parameter settings of the comparative algorithms

5.3.1 The results of HCBP, HBBP, BES and PSO

Table 8 p - values of the Wilcoxon sum-rank test for the fitness values of obtained by HCBP, HBBP versus the basic optimizers ( p ⩽ 0.05 are significant and shown in bold, NaN: Not Applicable)

Table 14 Fitness function values obtained by HCBP, BGWOPSO, BBA, BDA and BGWO

6. Conclusion and future work

Footnotes

Acknowledgments

List of symbols and abbreviations

References

Table 1
Benchmark data sets

Table 2
Parameter tuning results of $c_{1}$ and $c_{2}$ in the proposed hybrid algorithm

Table 5
Average results of various number of neighbors ( $k$ ) of $K$ NN classifier in the proposed hybrid algorithm

Table 6
Key parameter settings of the comparative algorithms

Table 8
$p-$ values of the Wilcoxon sum-rank test for the fitness values of obtained by HCBP, HBBP versus the basic optimizers ( $p\leqslant 0.05$ are significant and shown in bold, NaN: Not Applicable)

Table 14
Fitness function values obtained by HCBP, BGWOPSO, BBA, BDA and BGWO