A novel ensemble support vector machine model for land cover classification

Abstract

Nowadays, support vector machines are widely applied to land cover classification although this method is sensitive to parameter selection and noise samples. AdaBoost is an effective approach to find a highly accurate classifier by combining many weak and accurate classifiers. In this article, a novel ensemble support vector machine model that uses AdaBoost approach is proposed to mitigate the influence of noises and error parameters with focus on application on land cover classification. The key characteristics of this approach are that (1) a novel noise filtering scheme that avoids the noisy examples based on fuzzy clustering and principal component analysis algorithm is proposed to remove both attribute noises and class noises to achieve an optimal clean set and (2) support vector machine classifiers, based on the particle swarm optimization algorithm, are seen to component classifiers. We then combined finally individual prediction through AdaBoost algorithm to induce the final classification results on this new training set. A set of experiments is conducted on land cover classification for testing the performance of the proposed algorithm. Experimental results show that the classification accuracy can be increased using our proposed learning model, which results in the smallest generalization error compared with the other learning methods.

Keywords

Land cover support vector machines noise samples AdaBoost algorithm particle swarm optimization

Introduction

Land cover can really reflect the information of the earth’s surface coverage, which is closely related with human production and daily life. It is important to extract accurate and timely information from remote sensing imagery for social economy and ecological environment. At present, many machine-learning algorithms are widely used in remote sensing classification, such as neural network,^1,2 decision tree,^3,4 maximum likelihood method,⁵ rough set theory,⁶ and support vector machines (SVMs).⁷ SVMs are a new research focus in the field of machine learning and pattern recognition. SVMs have been proposed to tackle the problem of remote sensing imagery classification because they can better solve the high-dimensional characteristics and multi-dimensional data classification.^8–12 Although SVMs have a very good effect in the extraction of remote sensing information, there are still some problems such as parameter selection and noise sample interference. In real-world datasets, class noises will also confuse a machine-learning algorithm in the training phase.^13,14 Usually, noises usually exist in classification problems, including class noises and attribute noises. Class noises are regarded as ineffective samples, namely, misclassification samples and contradictory samples.¹⁵ Attribute noises are produced by introducing erroneous and missing attribute values. The parameters of SVMs mainly include kernel function parameters and penalty parameters.

Inspired by the idea of multi-information fusion approach,^16,17 the basis theory of the proposed AdaBoost ensemble learning is to improve the performance of a single classifier by the combining multiple classifiers. Its learning performance depends on the diversity of individual learners and the effective combination of classifiers. Breiman¹⁸ presented the famous Bagging algorithm based on the re-sampling technology. Next, Freund and Schapire¹⁹ improved the boosting algorithm and proposed adaptive boosting (AdaBoost) algorithm. AdaBoost can create a set of component classifiers by maintaining a set of weights on the training samples. In addition, it can adaptively adjust these weights. It has become the most popular ensemble method and been widely used in many fields. Pal introduced finite ensemble approaches, based on boosting and bagging and infinite ensemble created by embedding the infinite hypothesis in the kernel of SVMs. Then, the performance of two different ensemble approaches is used to address land cover classification. The results showed that boosting algorithm can improve the classification performance of SVMs for land cover classification.²⁰ Maulik and Chakraborty²¹ presented semi-supervised ensemble SVMs classifier to tackle remote sensing image classification. SVM-basedIEL was introduced by Yang et al.²² to significantly improve the accuracy and efficiency of remote sensing classification. Li et al.²³ proposed the AdaBoost that incorporates properly designed radial basis function SVMs (RBFSVM) component classifiers, which it was called AdaBoostSVMs. The results show that it is superior to the single SVMs classification model.

In our work, a novel ensemble SVMs learning scheme based on AdaBoost is proposed to overcome the shortcomings of single SVMs. First, SVMs kernel function parameters and penalty parameters are optimized by particle swarm optimization (PSO) algorithm. Then, the principal component analysis (PCA) method is adopted to eliminate noise features for remote sensing image. At the same time, a fuzzy c-means (FCM) clustering technology is introduced to reduce the class noises. Finally, the improved SVMs is used as a base classifier to train different classifiers for the same training set. Several weak classifiers are generated and combined to form a stronger final classifier using AdaBoost algorithm for further improving the classifier generalization ability. In order to assess the proposed model, experiments are conducted to tackle the problem of the land cover classification.

Study methods

Formulation of SVMs

SVMs, developed by Vapnik²⁴, are supervised learning algorithm based on the theory of structural risk minimization. The decision function of SVMs is expressed as formula (1) for a binary classification problem

f (x) = [ω \cdot φ (x) + b]

(1)

Given a training set of instances {x_i, y_i}, where i = 1, 2, …, n, x ∈ Rn, y ∈ {–1, +1}, and n is the number of samples. $φ (x)$ is the nonlinear mapping function which transforms the sample data from original space to the high-dimensional feature space. $ω$ determines the orientation of the hyperplane in space, and b is the bias that the distance of hyperplane from the origin. We can obtain the optimal values of $ω$ and b when they are used to solve the following optimization problem.

Minimize

g (ω, ξ) = \frac{1}{2} ‖ ω ‖^{2} + \sum_{i = 1}^{n} ξ

(2)

Subject to

y_{i} (ω, φ (x_{i}) + b) + ξ_{i} ⩾ 1, ξ_{i} ⩾ 0, 1 ⩽ i ⩽ n

(3)

where $C$ is the penalty factor and $ξ_{i}$ is the ith slack variable. The above-mentioned minimization problem can be written as formula (4) combining the method of Lagrange multipliers

Maximize

L (α) = \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} φ (x_{i}) φ (x_{j})

(4)

Subject to

\sum_{i = 1}^{n} α_{i} y_{i} = 0, α_{i} ⩾ 0, 1 ⩽ i ⩽ n

(5)

where $α$ is the Lagrange multiplier.

Next, the nonlinear decision function is written as

f (x) = sign (\sum_{i, j = 1}^{n} α_{i} y_{i} (φ (x_{i}) \cdot φ (x_{j})) + b)

(6)

The kernel function $K (x_{i}, x_{j})$ is used to calculate the value of $φ (x_{i}) \cdot φ (x_{j})$ , and the function can be described as

f (x) = sign (\sum_{i, j = 1}^{n} α_{i} y_{i} K (x_{i}, x_{j}) + b)

(7)

The equation of RBF kernel function is

K (x_{i}, x_{j}) = \exp (- \frac{1}{σ^{2}} ‖ x_{i} - x_{j} ‖)

(8)

For the SVMs, there are four kernel functions, such as linear, polynomial, RBF, and sigmoid kernels. Among them, the RBF can perform in an excellent way in many practical applications. RBF kernel is used in this article. The performance of SVMs can be affected by the parameter values of $σ$ and $C$ .

SVMs classification model is constructed through training labeled examples, which can reliably predict the class label for new previously unlabeled examples. Obviously, the model obtained will be negatively affected if the data used to train this model contain noise samples during the learning phase. To better illustrate the effect of noises on SVMs classifiers, we compare the effect of noise samples and noiseless samples on SVMs classifiers, as shown in Figure 1. Two kinds of samples (class A and class B) are randomly generated by MATLAB. We use the traditional SVMs algorithm as classifier and choose the RBF as the kernel function. The values of penalty parameter $C$ and the kernel function parameter $σ$ are 2 and 0.01, respectively. The classification results are shown in Figure 1(a), where green dots denote the samples of class A, blue dots denote the samples of class B, red circles denote the support vector, and purple dots denote the noise samples. We can find that the hyperplane of classifier changes, which deviates from the class B to class A when noise samples are added to class B (red dots). At the same time, the model will become more complex due to the participation of non-real data in the process of model training. Therefore, it is essential to eliminate noises or reduce its impact for model building.

Figure 1.

Visualization of SVMs classification results,where green dots denote the samples of class A, blue dots denote the samples of class B, red circles denote the support vector, and purple dots denote the noise samples in (b).

Fuzzy clustering algorithm

Clustering problem partitions a given dataset into clusters for detecting clusters and discovering hidden structural distribution and pattern features in the underlying data. In this article, the FCM clustering algorithm is used for obtaining more reliable and non-noisy training samples based on membership values of different clusters. Here, FCM clustering algorithm is described as follows.

FCM clustering algorithm is an unsupervised learning technology. Bezdek²⁵ used fuzzy set theory to improve the clustering algorithm in order to solve the hard-clustering problem. The algorithm is based on the minimization of an objective function (c-means functional), which can be defined as

J (X; U; V) = \sum_{i = 1}^{c} \sum_{k = 1}^{N} {(μ_{ik})}^{m} ‖ x_{k} - v_{i} ‖_{A}^{2}

(9)

where

V = [v_{1}, v_{2}, \dots, v_{c}], v_{i} \in R^{n}

(10)

is a vector of cluster prototypes (centers), and

D_{ikA =}^{2} ‖ x_{k} - v_{i} ‖_{A}^{2} = {(x_{k} - v_{i})}^{T} A (x_{k} - v_{i})

(11)

is a squared inner-product distance norm. Formula (9) can be regarded as a measure of the total variance of x_k from v_i. The minimization of the objective function (equation (9)) represents a nonlinear optimization problem, which can be solved using a variety of available methods, including grouped coordinate minimization, simulated annealing, and genetic algorithms. The stationary points of the objective function (equation (9)) can be found by adjoining the constraint $\sum_{k = 1}^{c} μ_{ik} = 1, 1 ⩽ i ⩽ N$ to J by means of Lagrange multipliers

\bar{J} (X; U, V, λ) = \sum_{i = 1}^{c} \sum_{k = 1}^{N} {(μ_{ik})}^{m} D_{ikA}^{2} + \sum_{k = 1}^{N} λ_{k} (\sum_{i = 1}^{c} μ_{ik} - 1)

(12)

and by setting the gradients of $(\bar{J)}$ with respect to U, V, and $λ$ to zero, if $D_{ikA}^{2} > 0$ , $\forall i, k$ and m > 1, then $(U, V) \in M_{fc} \times R^{n \times c}$ may minimize (equation (9)) only if

μ_{ik} = \frac{1}{\sum_{j = 1}^{c} {(D_{ikA} / DjkA)}^{2 / (m - 1)}}, 1 ⩽ i ⩽ c, 1 ⩽ k ⩽ N

(13)

and

v_{i =} \frac{\sum_{k = 1}^{N} μ_{ik}^{m} x_{k}}{\sum_{k = 1}^{N} μ_{i, k}^{m}}, 1 ⩽ i ⩽ c

(14)

This solution also satisfies the remaining constraints $μ_{ij} \in [0, 1], 1 ⩽ i ⩽ N, 1 ⩽ k ⩽ c$ and $0 < \sum_{k = 1}^{c} μ_{ik} < N,$ $1 ⩽ k ⩽ c$ . It can be seen that the algorithm is a simple iteration through equations (13) and (14).

A novel AdaBoost ensemble SVMs model

Parameters of SVMs optimized based on PSO (PSVMs)

PSO is originated from the study of bird prey behavior, which is similar to genetic algorithms in a way that it is based on particles and fitness function. It also starts from random solution and searches for optimal solution by iteration. Each particle i contains two important characteristics: velocity and position. The position and velocity of each particle can be calculated and updated in each iteration. PSVMs algorithm steps are shown in Table 1. $X_{i} = (X_{i 1}, X_{i 2}, \dots, X_{id})$ denotes the set of particles, and $V_{i} = (V_{i 1}, V_{i 2}, \dots, V_{id})$ denotes the velocity of particles. Let $P_{i} = (P_{i 1}, P_{i 2}, \dots, P_{id})$ denote the local optimal value of the particle i. Let $P_{g} = (P_{g 1}, P_{g 2}, \dots, P_{gd})$ denote the global optimal value of the whole particle swarm, where $d = 1, 2, 3, \dots, D$ (D denotes the particle dimension). In steps 4 and 5, P_id and P_gd, respectively, denote the best previous position encountered by the ith particle and the global best position thus far. Let k denote the iteration counter. $X_{id}^{k}$ represents the velocity of the dth dimension of the ith particle at time k. r₁ and r₂ are random functions in the range [0, 1], and ω is the inertia weight. c₁ and c₂ are the learning factors.

Table 1.

PSVMs algorithm.

Step	Action taken
Step 1	Load data and normalization.
Step 2	Generate the particle swarms, which include $C$ and $σ$ . Initialize particle swarm parameters, such as learning factors,c₁ and c₂; and number of cycles.
Step 3	Train SVMs classifier and compute the fitness function f (as shown in formula (15)).
Step 4	For each particle, acquire P_id and P_gd.
Step 5	Update the velocity of the particles: $V_{id}^{k + 1} = ω V_{id}^{K} + c_{1} r_{1} (P_{id}^{k} - X_{id}^{k}) + c_{2} r_{2} (P_{gd}^{k} - X_{id}^{k})$
Step 6	Update the position of the particles: $X_{id}^{k + 1} = X_{id}^{k} + V_{id}^{k + 1}$
Step 7	If iteration reaches the maximum number of iterations, go to step 8; otherwise, return to step 3.
Step 8	Output parameters $C$ , $σ$ , and classification accuracy.

Here, we set up particle components and fitness functions. In this article, SVMs are considered as base classifiers, where RBF kernel function is used in the proposed model. So, the particle comprises penalty parameter $C$ and kernel function parameter $σ$ . Generally, fitness function can evaluate the quality of each particle. We take the classification accuracy based on SVMs classifier as the fitness function of the model, which is defined as

f = \frac{p}{p + up} \times 100 %

(15)

where up denotes the number of false classifications and $p$ denotes the number of true classifications.

Eliminating noise method

The presence of irrelevant features and noises could deteriorate the performance of the assessment model.^26,27 In this modeling, we propose an eliminating noise (EN) algorithm to address the noises of datasets. First, PCA is used to form a new subset of attributes and remove ratios which are not useful correlate attributes. Second, the FCM algorithm is introduced to detect the noises based on the movement of the center for clustering in the training dataset. The mean of samples is regarded as the initial center. We calculate the membership function through clustering. The samples of smaller membership, which are suspected to boundary data of containing noises, are eliminated in each iteration. Then, we repeated clustering, until the distance of clustering center for each class satisfies ending condition. The specific steps are shown in Table 2.

Table 2.

Eliminating noises algorithm.

Step	Action taken
Step 1	Normalization of the data.
Step 2	PCA dimensionality reduction.
Step 3	Set the parameters and initialize fuzzy partition matrix.
Step 4	Calculate membership matrix in formula (13).
Step 5	Update cluster center in formula (14).
Step 6	Evaluate object function in formula (9). Compare object functions $J_{m}^{(k)}$ and $J_{m}^{(k + 1)}$ , if $‖ J_{m}^{(k + 1)} - J_{m}^{(k)} ‖ < ε$ , stop iteration; else k = k + 1, go to step 4.
Step 7	Sort the membership matrix and eliminate the smallest membership.
Step 8	Evaluate the distance of clustering center. If $v' - v ⩽ q$ , the iteration ends; else, loop to step 5. Where q is set as threshold. The smaller the q, the better thequality of clustering.

Ensemble SVMs model based on AdaBoost (AdaPSVMs)

AdaBoost can enhance the generalization capability of classifiers by adaptively adjusting the weights of misclassified samples. In this study, PSVMs are seen as component learner. Given a set of training samples, AdaBoost can set a weight distribution $ω$ over the samples. At cycle t, the component classifier c_t will be produced. According to the prediction results of training samples, $ω_{t}$ will be updated. The algorithm will give greater weights to classifiers c_t with lower training errors. AdaBoost finally linearly combines all the weak classifiers into a final strong hypothesis f. In addition, the EN algorithm is used to remove noise information and structural features in the data preprocessing phase because the strong correlation between the bands can decrease the accuracy of classifier. The AdaPSVMs algorithm is shown in Table 3.

Table 3.

AdaPSVMs algorithm.

Step	Action taken
Step 1	Load data and normalization.
Step 2	Procession training dataset by EN algorithm.
Step 3	Initialize the weights of training dataset: $ω_{i}^{1} = 1 / N$ , for all $i = 1, \dots, N$ . N is the number of samples, and T is the number of cycles.
Step 4	Do for t = 1, …, T
	(1) PSVMs is used to train a component classifier $c_{t}$ on the weighted training samples.
	(2) The training error $ε_{t}$ can be calculated according to $ε_{t} = \sum_{i = 1}^{N} ω_{i}^{t}, y_{i} \neq c_{t} (x_{i})$ .
	(3) Set weight for the individual classifier $α_{t} = \frac{\ln (\frac{1 - ε_{t}}{ε_{t}})}{2}$ .
	(4) Update the weight on training samples: $ω_{i}^{t + 1} = ω_{i}^{t} \exp {- α_{t} y_{i} c_{t} (x_{i})} / K_{t}$ , i = 1, …, N, where $K_{t}$ is the normalizationconstant and $\sum_{i = 1}^{N} ω_{i}^{t + 1} = 1$ .
Step 5	$f (x) = sign (\sum_{t = 1}^{T} α_{t} c_{t} (x))$ .

Accuracy assessment

The article uses kappa coefficient as evaluation index except the total classification accuracy, and the formula is shown as

kappa = \frac{N \sum_{i = 1}^{r} x_{ii} - \sum_{i = 1}^{r} (x_{i +} x_{+ i})}{N^{2} - \sum_{i = 1}^{r} (x_{i +} x_{+ i})}

(16)

where r is the number of error matrix rows, x_ii is the value of ith row and ith column, x_i ₊ is the sum of the ith row, x ₊ _i is the sum of the ith column, and N is the total number of samples.

Results and discussions

Study area

The Tumen River area located in the border areas of China, North Korea, and Russia (41°06′–44°05′ N and 127°39′–131°44′ E) is shown in Figure 2. The region has varied geomorphologic types and rich natural resources, which make the land cover categories diverse. Landsat-5 TM image data, which were acquired on 30 September 2009, are considered here for the experiment. The spectral bands include data in the blue (band 1), green (band 2), red (band 3), near-infrared (band 4), and mid-infrared (bands 5 and 7) regions of the electromagnetic spectrum. Among these bands, thermal band 6 is eliminated because it has a larger pixel size and less information for vegetation classification than the other bands.

Figure 2.

Location of the study area.

Sample set

In the study area, the land use classes include forest, farmland, building land, water, and others, respectively, with the category code ω₁–ω₅ according to the land cover classification system in Northeast China. The number of samples and land cover types are listed in Table 4. The sample set is formed using random pixel selection strategy, which guarantees maximum variation and representativity for each class. We divided the dataset of 3838 samples into two subsets, one for training (2059 samples) and the other for testing (1779 samples).

Table 4.

Name of classes and number of samples.

Classcode	Classes	No. of training samples	No. oftestingsamples
ω ₁	Building land	335	418
ω ₂	Farmland	550	371
ω ₃	Water	125	237
ω ₄	Others	367	389
ω ₅	Forest	682	364
Total	5	2059	1779

Feature selection

Feature selection can be regarded as the process of selecting a minimum subset of m features from the original dataset of n features (m < n). Its main purpose is to improve the generalization ability of the classifier and reduce the computational complexity. In this study, we take into account the following three aspects, and finally eight features are adopted, which include TM image band information, PCA principal component, and normalized difference vegetation index (NDVI).

TM band selection: band features include first to fifth bands, seventh band of TM image, and a total of six bands. The deleted sixth band is a thermal infrared band and generally does not participate in band synthesis because its spatial resolution is low (120 m).

Eliminate redundant features: PCA is used to remove the repetitive and redundant information between the bands of multi-spectral remote sensing images. The TM images obtained in the study area in September 2006 are used in the PCA test, and the results are shown in Figure 3. In Figure 3(a), it is easy to find that band 1 contains the most abundant information. In contrast, band 6 (Figure 3(f)) and band 7 (Figure 3(g)) have very little information and include many noises. In addition, it can also be seen in Table 5 that band 1, band 2, and band 3 feature values are larger in the seven TM bands, and among values, the value of band 1 is the largest, which is 493.3175. Therefore, the article chooses the PCA’s first principal component as one of the classification features.

NDVI: NDVI can well reflect the vegetation growth status and vegetation coverage, which is one of the most effective parameters to evaluate vegetation status. NDVI is selected as the important classification feature, and the result is shown in Figure 3(h)

NDVI = \frac{(B_{4} - B_{3})}{(B_{4} + B_{3})}

(17)

where B₄ and B₃ are the near-infrared band reflection value and the red band reflection value, respectively.

Figure 3.

Principal component analysis results in some areas of the study area, where (a)–(g) are the results of principal component analysis of bands 1–7, and (h) is the result of NDVI analysis.

Table 5.

Principal component analysis eigenvalue.

Band	Eigenvalue	%
Band 1	493.3175	74.48
Band 2	123.6198	93.14
Band 3	28.7049	97.47
Band 4	7.9365	98.67
Band 5	4.5903	99.36
Band 6	3.6058	99.91
Band 7	0.6042	100.00

Reducing class noises by FCM clustering

Then, we implement the FCM algorithm using the training subset. The optimal dataset will be obtained through several clustering to gradually eliminate class noises.

In order to achieve the integrity and accuracy of test, we follow the two points: (1) the number of class noises is less than 1/10 of each class samples; (2) the number of selected class noises is 0.5% of each class sample in each iteration. When the distance of the two-cluster center is less than 0.0001, clustering is stopped. For FCM clustering algorithm parameters, m is the weighting exponent which determines the fuzziness of the clusters. The default value of m is 2. e is the termination tolerance of the clustering method which is set default as 0.001. The number of original training samples is 2059, and the number of training samples after denoising is 1871. The results are shown in Table 6.

Table 6.

Number of training samples after denoising.

Class code	No. of trainingsamples	No. of trainingsamples afterdenoising
ω ₁	335	319
ω ₂	550	478
ω ₃	125	115
ω ₄	367	337
ω ₅	682	622
Total	2059	1871

Comparison of the classification models

In this section, we implement the AdaPSVMs to tackle land cover classification for part of subset images with two other classification algorithms, namely, traditional SVMs and PSVMs, for comparison purposes. The classification results obtained by the three algorithms are presented in Table 7, as well as including the parameter values, kappa coefficients, and overall accuracies. Compared with the SVMs model ( $C$ = 1, $σ = 0.5$ ), the classification accuracy using PSVMs can be improved from 84.07% to 89.50%, and kappa coefficient is increased from 0.8172 to 0.8703. Second, more accurate accuracy (91.69%) is obtained by the AdaPSVMs model, which is 2.19% and 7.62% greater than that by PSVMs (89.50%) and SVMs (84.07%), respectively. In addition, the AdaPSVMs also produce higher kappa coefficient (0.9075) than the PSVMs (0.8703) and SVMs (0.8172). The result shows that AdaPSVMs model has better classification performance than SVMs using default parameter value and PSVMs method. The above experimental results clearly show that the proposed model has higher classification performance, and further confirm that noise samples and parameter selection are two important factors affecting the accuracy of the classifier.

Table 7.

Classification results obtained using AdaPSVMs, PSVMs, and SVMs.

Classification model	Parameter $C$	Parameter $σ$	Kappa coefficients	Overall accuracies (%)
SVMs	1	0.5	0.8172	84.07
PSVMs	61.9423	1. 66	0.8703	89.50
AdaPSVMs	76.7901	0.134	0.9075	91.69

Conclusion

Parameter values and noise samples can potentially “corrupt” the model performance of the SVMs learning schemes. In our article, an ensemble SVMs model using an AdaBoost algorithm called as AdaPSVMs is presented to reduce the impact of this issue. PCA is first used to eliminate redundant features of sample set. Then, FCM is presented to search and eliminate class noise. The PSO algorithm is also proposed to search the optimal parameters of SVMs. Our attempt here is to obtain the minimum generalization error through non-noisy dataset and ensemble method. Finally, we implement our model to solve the problem of land cover classification. Experimental results demonstrate that the proposed AdaPSVMs model has better performance than PSVMs and SVMs models.

Footnotes

Handling Editor: Daming Zhou

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a grant from the National Natural Science Foundation of China (no. 61402193), the Foundation of Jilin Provincial Science & Technology Department (no. 20180101337JC), the Science and Technology Bureau of Changchun (no. 17DY009), the National Society Science Foundation of China (no. 15BGL090), the National Natural Science Foundation of China (nos 61572225 and 61806082), and the open foundation of Laboratory of Logistics Industry Economy and Intelligent Logistics in Jilin University of Finance and Economics (no. 201702).

ORCID iD

Ying Liu

References

Kavzoglu

Mather

. The use of backpropagating artificial neural networks in land cover classification. Int J Remote Sens 2003; 24: 4907–4938.

Tissari

Nykänen

Lerssi

et al . Classification of soil groups using weights-of-evidence method and RBFLN-neural nets. Nat Resource Res 2007; 16(2): 159–169.

Zhang

et al . Freshwater marsh wetland information extraction based on QUEST decision tree integrating with multi-source data. Chin J Ecol 2009; 28(2): 357–365.

Pal

Mather

. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 2003; 86: 554–565.

Gevana

Camacho

Carandang

. Land use characterization and change detection of a small mangrove area in Banacon Island, Bohol, Philippines using a maximum likelihood classification method. Forest Sci Tech 2015; 11(4): 197–205.

Leung

Fung

. A rough set approach to the discovery of classification rules in spatial data. Int J Geograph Inf Sci 2007; 21(9): 1033–1058.

Tuia

Pacifici

Kanevski

et al . Classification of very high spatial resolution imagery using mathematical morphology and support vector machines. IEEE Trans Geosci Remote Sens 2009; 47(11): 3866–3879.

Wang

Jia

. Integration of soft and hard classifications using extended support vector machines. IEEE Geosci Remote Sens Lett 2009; 6(3): 543–547.

Zhou

Al-Durra

Gao

et al . Online energy management strategy of fuel cell hybrid electric vehicles based on data fusion approach. J Power Sourc 2017; 366: 278–291.

10.

Petropoulos

Kalaitzidis

Vadrevu

. Support vector machines and object-based classification for obtaining land-use/cover cartography from Hyperion hyperspectral imagery. Comput Geosci 2012; 41: 99–107.

11.

Liu

Zhang

Wang

et al . A self-trained semisupervised SVM approach to the remote sensing land cover classification. Comput Geosci 2013; 59: 98–107.

12.

Liu

Zhang

Huang

et al . A novel optimization parameters of support vector machines model for the land use/ cover classification. Int J Food Agricul Environ 2012; 10(2): 1098–1104.

13.

Zhou

Al-Durra

Zhang

et al . Online remaining useful lifetime prediction of proton exchange membrane fuel cells using a novel robust methodology. J Power Source 2018; 399: pp.314–328.

14.

Almast

Rouhani

. Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turk J Electric Eng Comput Sci 2016; 24: 219–233.

15.

Yang

Zhang

et al . A Kernel fuzzy c-means clustering-based fuzzy support vector machine algorithm for classification problems with outliers or noises. IEEE Trans Fuzz Syst 2011; 19(1): 105–115.

16.

Zhou

Nguyen

Breaz

et al . Global parameters sensitivity analysis and development of a two-dimensional real-time model of proton-exchange-membrane fuel cells. Energ Conver Manage 2018; 162: 276–292.

17.

Zhou

Al-Durra

Matraji

et al . Online energy management strategy of fuel cell hybrid electric vehicles: a fractional-order extremum seeking method. IEEE Trans Ind Electron 2018; 65(8): 6787–6799.

18.

Breiman

. Bagging predictors. Mach Learn 1996; 24(2): 123–140.

19.

Freund

Schapire

. Experiments with a new boosting algorithm. In: Proceedings of 13th international conference on machine learning, Bari, 3–6 July 1996, pp.148–156. San Francisco, CA: Morgan Kaufmann Publishers.

20.

Pal

. Ensemble of support vector machines for land cover classification. Int J Remote Sens 2008; 29(10): 3043–3049.

21.

Maulik

Chakraborty

. A self-trained ensemble with semisupervised SVM: an application to pixel classification of remote sensing imagery. Pattern Recog 2011; 44: 615–623.

22.

Yang

Qin

Zhang

. An infinite ensembled SVM model for remote sensing image classification. Sci Surv Map 2013; 1: 47–50.

23.

Wang

Sung

. AdaBoost with SVM-based component classifiers. Eng Appl Artif Intell 2008; 21: 785–795.

24.

Vapnik

. The nature of statistical learning theory. New York: Springer, 1995, pp.1–14.

25.

Bezdek

. Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press, 1981.

26.

Chen

. Lump solutions to a generalized Bogoyavlensky-Konopelchenko equation. Front Math China 2018; 13(3): 525–534.

27.

Fan

Huang

Wang

et al . “Global Mittag-Leffler synchronization of delayed fractional-order memristive neural networks. Adv Diff Equ 2018; 2018: 338.