Locality preserving projection twin support vector machine and its application in classification

Abstract

Twin support vector machine (TWSVM) and projection twin support vector machine (PTSVM), are two extensions of traditional support vector machine (SVM). However, TWSVM and PTSVM did not consider the local geometrical structure information of training samples. Therefore, a locality preserving projection twin support vector machine (LPPTSVM) is presented by introducing the basic idea of the locality preserving projection (LPP) into PTSVM. This method not only inherits the ability of TWSVM and PTSVM for dealing with the XOR problem, but also fully considers the local geometrical structure between samples and shows the local underlying discriminatory information. For linear LPPTSVM method, regularization technique is used to overcome the singularity problem, and then the nonlinear LPPTSVM method is constructed by the empirical kernel mapping. Experimental results conducted on the artificial datasets and UCI datasets illustrate the effectiveness of the LPPTSVM method.

Keywords

Pattern classification twin support vector machine projection twin support vector machine locality preserving projection

Introduction

Support vector machine (SVM) is one of the excellent machine learning tools for binary classification and regression^1,2 and has been widely applied to a variety of real-world problems ranging from image classification,³ text categorization,⁴ bioinformatics,⁵ etc. However, one of the main challenges for SVM is the high computational complexity and SVM solution does not fully take into account the class distribution.⁶ Recently, multi-hyperplane support vector machine such as twin support vector machine (TWSVM)⁷ and projection twin support vector machine (PTSVM),⁸ as an extension direction of SVM, has been one of the hot research topics in the field of pattern recognition. In 2006, Mangasarian and Wild⁹ proposed a nonparallel plane classifier for binary data classification, which is termed as the generalized eigenvalue proximal support vector machine (GEPSVM). The essence of GEPSVM is to seek two nonparallel planes, so that data points of each class are proximal to one of them. GEPSVM discarded the parallelism condition of proximal support vector machine (PSVM)¹⁰ and required that each hyperplane be as close as possible to one of the data sets and as far as possible from the other data sets. Hence, multiple surface support vector machines have been widely investigated, e.g. TWSVM,⁷ PTSVM,⁸ MVSVM,¹¹ TBSVM,¹² and so on.^13–17 However, in the process of learning, the above mentioned classification methods are almost not fully consider the local geometrical structure information of the sample. In order to effectively reveal the local geometrical structure of the sample, many scholars conducted a lot of research and achieved rich results, e.g. isometric mapping (IM),¹⁸ locally linear embedding (LLE),¹⁹ Laplacian eigenmap (LE)²⁰ and locality preserving projections (LPP).²¹ Especially, the LPP method can keep the local geometrical structure between the samples. Furthermore, LPP can be easily extended to nonlinear embedded and can find the low dimensional nonlinear manifold structure.²² In order to utilize the advantages of LPP and combine with SVM, Laplacian support vector machine (LSVM) is proposed in Benkin et al.²³ and minimum class locality preserving variance support vector machine (MCLPVSVM) is proposed in Wang et al.²⁴ However, these methods are the results of direct combination of LPP and traditional SVM. There are not many researches on the combination of LPP and multi-hyperplane support vector machines.

Based on the above analysis, in this paper, we propose a novel locality preserving projection twin support vector machine, termed LPPTSVM. LPPTSVM method introduces the basic idea of LPP into PTSVM and a regularization term is used to overcome the singularity problem. This method has the following advantages: first, LPPTSVM inherits the advantages of TWSVM and PTSVM, which can well deal with crossplane (XOR) problem; second, LPPTSVM fully considers the local geometrical structure information of samples, which can improve the generalization ability of the algorithm to a certain extent; third, regularization technique is used to overcome the singularity problems, which can ensure the stability of the algorithm.

The rest of this paper is organized as follows. The related works will be briefly reviewed in the Related works section. The Locality preserving projection twin support vector machine section proposes our linear LPPTSVM and its nonlinear version, and in addition, the successive overrelaxation (SOR) algorithm is also proposed in this section. Experimental results are described in the Experimental results section, and conclusion and future works are presented in the last section.

Related works

Consider a binary classification problem in the n-dimensional real space $R n$ and a set of training data points represented by $T = {(x_{j}^{(i)}, y j) | i = 1, 2; j = 1, 2, \dots, m i}$ , where $x_{j}^{(i)} \in R n$ is the j-th input which belongs to class W_i and $m = m 1 + m 2, y i \in {+ 1, - 1}$ are corresponding outputs. The m₁ inputs of positive class W₁ are organized as the matrix $A \in R m 1 \times n$ and the m₂ inputs of negative class W₂ are organized as the matrix $B \in R m 2 \times n$ .

Twin support vector machine

For the linear case, the TWSVM⁷ determines two nonparallel hyperplanes

w_{1}^{T} x + b 1 = 0 and w_{2}^{T} x + b 2 = 0

(1)

where

w i \in R n, b i \in R, i = 1, 2

. TWSVM seeks a pair of nonparallel hyperplanes such that each hyperplane is proximal to the data points of one class and far away from the data points of another class. The primal problems of TWSVM are

min w 1, b 1, ξ 2 \frac{1}{2} ‖ Aw 1 + e 1 b 1 ‖ 2 2 + c 1 e_{2}^{T} ξ 2, s . t . - (Bw 1 + e 2 b 1) + ξ 2 \geq e 2, ξ 2 \geq 0

(2)

min w 2, b 2, ξ 1 \frac{1}{2} ‖ Bw 2 + e 2 b 2 ‖ 2 2 + c 2 e_{1}^{T} ξ 1, s . t . - (Aw 2 + e 1 b 2) + ξ 1 \geq e 1, ξ 1 \geq 0

(3)

where

c 1, c 2 \geq 0

are the penalty factors,

ξ 1, ξ 2

are slack variables, and

e 1, e 2

are vectors of ones.

Projection twin support vector machine

The central idea of linear projection twin support vector machine⁸ is to find a projection axis for each class, such that within-class variance of the projected data points of its own class is minimized meanwhile the projected data points of the other class scatter away as far as possible. Thus, the primal problems of linear PTSVM are a pair of QPPs

min w 1 \frac{1}{2} w_{1}^{T} S 1 w 1 + c 1 e_{2}^{T} ξ 2 s . t . Bw 1 - \frac{1}{m 1} e 2 e_{1}^{T} Aw 1 + ξ 2 \geq e 2, ξ 2 \geq 0

(4)

min w 2 \frac{1}{2} w_{2}^{T} S 2 w 2 + c 2 e_{1}^{T} ξ 1 s . t . - (Aw 2 - \frac{1}{m 2} e 1 e_{2}^{T} Bw 2) + ξ 1 \geq e 1, ξ 1 \geq 0

(5)

where

c 1 > 0

and

c 2 > 0

are trade-off constants,

e 1 \in R m 1

and

e 2 \in R m 2

are both vectors of ones, and

ξ 1

and

ξ 2

are both nonnegative slack variable vectors. S₁ and S₂ are within-class variance matrix which are expressed as

S 1 = \sum_{i = 1}^{m 1} (x_{i}^{(1)} - \frac{1}{m 1} \sum_{j = 1}^{m 1} x_{j}^{(1)}) (x_{i}^{(1)} - \frac{1}{m 1} \sum_{j = 1}^{m 1} x_{j}^{(1)}) T

(6)

S 2 = \sum_{i = 1}^{m 2} (x_{i}^{(2)} - \frac{1}{m 2} \sum_{j = 1}^{m 2} x_{j}^{(2)}) (x_{i}^{(2)} - \frac{1}{m 2} \sum_{j = 1}^{m 2} x_{j}^{(2)}) T

(7)

Obviously, from equations (2) to (5), we can see that the objective function of TWSVM and PTSVM does not consider local geometrical structure between the samples.

Locality preserving projection twin support vector machine

Linear LPPTSVM

Definition 1. Let matrices $X 1 = A, X 2 = B$ ²⁴; the matrix $Z k = X_{k}^{T} (D k - W k) X k$ is called the locality preserving within-class scatter matrix of class $k (k = 1, 2)$ , where $D k$ is diagonal matrix and $D_{ii}^{k} = \sum_{j} W_{ij}^{k}$ , $W k$ is the weight matrix of the kth class X_k and $W_{ij}^{k} = \exp (- | | x_{i}^{k} - x_{j}^{k} | | 2 / t)$ , where t is heat kernel parameter.

In order to overcome the singularity problem of scatter matrix, a regularization term is added to objective function similar to literatures,^12,15 and we introduce the basic idea of LPP into PTSVM and obtain the primal optimal problem as follows.

min w 1 \frac{1}{2} w_{1}^{T} Z 1 w 1 + \frac{υ 1}{2} w_{1}^{T} w 1 + C 1 e_{2}^{T} ξ 2 s . t . - (Bw 1 - \frac{1}{m 1} e 2 e_{1}^{T} Aw 1) + ξ 2 \geq e 2, ξ 2 \geq 0

(8)

and

min w 2 \frac{1}{2} w_{2}^{T} Z 2 w 2 + \frac{υ 2}{2} w_{2}^{T} w 2 + C 2 e_{1}^{T} ξ 1 s . t . Aw 2 - \frac{1}{m 2} e 1 e_{2}^{T} Bw 2 + ξ 1 \geq e 1, ξ 1 \geq 0

(9)

where

υ i, C i, ξ i, e i (i = 1, 2)

are regularization parameters, penalty parameters, slack vectors, and the vectors of ones.

In order to get the solution to primal problem equation (8), we need to derive its dual problem. Therefore, the Lagrangian function is introduced as follows

L (w 1, α, β, ξ 2) = \frac{1}{2} w_{1}^{T} Z 1 w 1 + \frac{υ 1}{2} w_{1}^{T} w 1 + C 1 e_{2}^{T} ξ 2 - α T (- (Bw 1 - \frac{1}{m 1} e 2 e_{1}^{T} Aw 1) + ξ 2 - e 2) - β T ξ 2

(10)

where

α, β

are the vectors of Lagrangian multipliers. By using Karush–Kuhn–Tucker (KKT) conditions²⁵ for

w 1, ξ 2

and

α, β

are given by

\frac{\partial L}{\partial w 1} = Z 1 w 1 + υ 1 w 1 + (B - \frac{1}{m 1} e 2 e_{1}^{T} A) T α = 0

(11)

\frac{\partial L}{\partial ξ 2} = C 1 e 2 - α - β = 0

(12)

α T (Bw 1 - \frac{1}{m 1} e 2 e_{1}^{T} Aw 1 - ξ 2 + e 2) = 0, β T ξ 2 = 0

(13)

α \geq 0, β \geq 0

(14)

Obviously, equation (11) implies that

w 1 = - (Z 1 + υ 1 I 1) - 1 (B - \frac{1}{m 1} e 2 e_{1}^{T} A) T α

(15)

where I₁ is an identity matrix.

Since $β \geq 0$ , from equations (12) and (14), we have

0 \leq α \leq C 1 e 2

(16)

Finally, putting equations (12) and (15) into the Lagrangian function equation (10), we obtain the dual problem as follows

max α e_{2}^{T} α - \frac{1}{2} α T (B - \frac{1}{m 1} e 2 e_{1}^{T} A) (Z 1 + υ 1 I 1) - 1 (B - \frac{1}{m 1} e 2 e_{1}^{T} A) T α s . t . 0 \leq α \leq C 1 e 2

(17)

where

a \in R m 2

is Lagrangian multiplier.

In the same way, we can obtain the Wolfe dual problem of equation (9) as follows

max γ e_{1}^{T} γ - \frac{1}{2} γ T (A - \frac{1}{m 2} e 1 e_{2}^{T} B) (Z 2 + υ 2 I 2) - 1 (A - \frac{1}{m 2} e 1 e_{2}^{T} B) T γ s . t . 0 \leq γ \leq C 2 e 1

(18)

where

γ \in R m 1

is Lagrangian multiplier.

By using Lagrangian method and KKT condition, we obtain

w 2 = (Z 2 + υ 2 I 2) - 1 (A - \frac{1}{m 2} e 1 e_{2}^{T} B) T γ

(19)

where I₂ is an identity matrix.

Once w₁ and w₂ are obtained from equations (15) and (19), the two nonparallel hyperplanes are known. A new data point $x \in R n$ is then assigned to the positive class W₁ or negative class W₂, depending on which of the hyperplanes it lies close to, i.e.

x \in W i, i = \arg min {d i} i = 1, 2 = {d 1, x \in class 1 d 2, x \in class 2

(20)

where

d i = | x T w i - \frac{1}{m i} e_{i}^{T} X i w i |, X 1 = A, X 2 = B, i = 1, 2

, and

| \cdot |

is the absolute value.

Nonlinear LPPTSVM

For nonlinear classification problem, the most commonly used approach is to first introduce nonlinear mapping and map the sample from the input space into high-dimensional feature space, and then use kernel techniques to perform linear algorithm in feature space. However, local preserving within-class scatter matrix Z₁ and Z₂ are not easy to express explicitly in the feature space. Therefore, in this paper, we use the empirical kernel mapping to map sample from input space into the empirical feature space.^25,26 In addition, empirical feature space preserves the geometrical structure of the feature space, and linear algorithm can be executed directly in the empirical feature space. So, the nonlinear LPPTSVM is constructed as follow.

First, for given kernel function $k (x, y)$ , assume the corresponding nonlinear mapping is $φ (x)$ , and then construct the kernel matrix $K train = [k ij] m \times m$ of training samples, where $k ij = φ T (x i) φ (x j) = k (x i, x j)$ . Because of the matrix $K train$ is symmetric and semidefinite matrix, it can be decomposed into

K train = P m \times r Λ r \times r P_{m \times r}^{T}

(21)

where r is the rank of

K train

, Λ is a diagonal matrix, the diagonal elements are positive eigenvalues of

K train

, and each column of matrix P is the corresponding eigenvector.

Second, we use empirical kernel mapping to map the training samples from input space into empirical feature space, that is

φ : x \to Λ - \frac{1}{2} P T (k (x, x 1), k (x, x 2), \dots, k (x, x m)) T

(22)

Therefore, the new training samples in empirical feature space can be expressed as

X_{train}^{e} = K train P Λ - \frac{1}{2}

(23)

At last, we take $X_{train}^{e}$ as new training samples and the linear LPPTSVM algorithm is executed directly in the empirical feature space.

Implementation

In this section, we discuss the implementation of our proposed LPPTSVM. In our LPPTSVM, the dual problem can be rewritten as the following unified form

min α \frac{1}{2} α T Q α - e T α s . t . 0 \leq α \leq ce

(24)

Where Q is a positive definite matrix.

As we can see, if $Q = (B - \frac{1}{m 1} e 2 e_{1}^{T} A) (Z 1 + υ 1 I 1) - 1 (B - \frac{1}{m 1} e 2 e_{1}^{T} A) T$ , the problem equation (24) becomes the problem equation (17) and if $Q = (A - \frac{1}{m 2} e 1 e_{2}^{T} B) (Z 2 + υ 2 I 2) - 1 (A - \frac{1}{m 2} e 1 e_{2}^{T} B) T$ , the problem equation (24) becomes the problem equation (18).

In our proposed LPPTSVM, most of the computational cost is incurred in solving the dual QPP equation (24). In order to solve the above QPP quickly, we use a very efficient optimization technique called successive overrelaxation (SOR) algorithm, which can be seen in the literatures.^12,15,27 SOR is an excellent QPP solver because it is able to deal with very large datasets that need not reside in memory.²⁷

Algorithm 1. SOR Algorithm

Step 1.

Select the parameter $t \in (0, 2)$ and the initial vector $α 0$ , set $k = 0$ ;

Algorithm 1. SOR Algorithm

Step1.

Select the parameter $t \in (0, 2)$ and the initial vector $α 0$ , set $k = 0$ ;

Step2.

Compute $α k + 1$ by $α k + 1 = (α k - tD - 1 (Q α k - e + L (α k + 1 - α k))) #$ ; where Q is one of the matrices equations (34) and (41). $(u) #$ denotes the 2-norm projection on the feasible region, that is

((u) #) i = {0, ifu i \leq 0 u i, if 0 < u i < c c, ifu i \geq c

And define $L + D + L T = Q$ , where L and Q are the strictly lower triangular matrix and the diagonal matrix, respectively;

Step3.

Stop if $‖ α k + 1 - α k ‖ < \in$ , where ∈ is desired tolerance; else replace $α k$ by $α k + 1$ , k by $k + 1$ and go to Step2.

Step 2.

((u) #) i = {0, ifu i \leq 0 u i, if 0 < u i < c c, ifu i \geq c

And define $L + D + L T = Q$ , where L and Q are the strictly lower triangular matrix and the diagonal matrix, respectively;

Step3.

Stop if $‖ α k + 1 - α k ‖ < \in$ , where ∈ is desired tolerance; else replace $α k$ by $α k + 1$ , k by $k + 1$ and go to Step2.

Algorithm 1. SOR Algorithm

Step1.

Select the parameter $t \in (0, 2)$ and the initial vector $α 0$ , set $k = 0$ ;

Step2.

((u) #) i = {0, ifu i \leq 0 u i, if 0 < u i < c c, ifu i \geq c

And define $L + D + L T = Q$ , where L and Q are the strictly lower triangular matrix and the diagonal matrix, respectively;

Step3.

Stop if $‖ α k + 1 - α k ‖ < \in$ , where ∈ is desired tolerance; else replace $α k$ by $α k + 1$ , k by $k + 1$ and go to Step2.

Experimental results

In order to evaluate the proposed LPPTSVM, we investigate its classification accuracies and computational efficiencies on three artificial datasets and a number of real-world UCI benchmark datasets. In the experiments, we focus on the comparison between our proposed LPPTSVM and some state-of-the-art classifiers, e.g. TWSVM,⁷ PTSVM⁸ and MVSVM.¹¹ All methods are implemented in MATLAB R2013a on a personal computer (PC) with an Intel (R) Core (TM) processor (3.40 GHz) and 4 GB random-access memory (RAM). PTSVM and our proposed LPPTSVM are solved by SOR algorithm. The eigenvalue problem in MVSVM is solved by MATLAB function ‘eig.m’ and the QPPs problems in TWSVM are solved by the optimization toolbox QP in MATLAB. The “Accuracy” used to evaluate methods is defined as Accuracy = (TP + TN)/(TP + FP + TN + FN), where TP, TN, FP, and FN are the number of true positives, true negatives, false positives, and false negatives, respectively. The parameters are selected by employing the standard 10-fold cross-validation methodology.

Toy examples

In this sub, three artificial datasets, including crossplane (XOR), complex XOR dataset and Two-moons manifold dataset²³ have been used to show that our proposed LPPTSVM can deal with linearly inseparable problems. In the experiment, XOR dataset contains 200 samples (100 positive samples and 100 negative samples) and Complex XOR dataset contains 260 samples (100 positive samples and 160 negative samples). Figure 1 shows the XOR and complex XOR dataset. Two-moons manifold dataset contains 100 samples (50 positive samples and 50 negative samples), and Figure 2 shows two kinds of Two-moons datasets with different complexity.

Figure 1.

Crossplane (XOR) (a) and Complex XOR (b) datasets.

Figure 2.

Two kinds of Two-moons datasets with different complexity. (a) Two-moons-1 dataset and (b) Two-moons-2 dataset.

For XOR and complex XOR datasets, we investigate the classification performance of linear TWSVM, PTSVM, and our LPPTSVM, the average results of which are reported in Table 1. For Two-moons datasets, we investigate the classification performance of nonlinear TWSVM, PTSVM and our proposed LPPTSVM with Gaussian kernel. We randomly selected 40% for training sets and 60% for testing sets, each experiment was repeated 10 times, and the average results are listed in Table 2. The heat kernel parameter t is selected over the range {0.5,1,1.5,2,2.5}, the parameters

υ i, C i

are selected over the range {0.001,0.01,0.1,1,10,100,1000}, and the kernel wide parameter σ of Gaussian kernel

K (x, y) = e - \frac{‖ x - y ‖ 2}{2 σ 2}

is selected over the range {0.5,1,2,4,8,16,32}. From Tables 1 and 2, we can observe that all the above methods obtain the follow: (1) LPPTSVM inherits the advantages of TWSVM and PTSVM, which not only can deal with XOR problem but also can well deal with complex XOR problem and get the best performance; (2) LPPTSVM also gets the best performance on Two-moons manifold datasets, which further verifies that the proposed method can better keep local geometrical information between samples.

Table 1.

Classification accuracy of TWSVM, PTSVM, and LPPTSVM on XOR datasets.

Dataset	TWSVM	PTSVM	LPPTSVM
Dataset	Accuracy (%)	Accuracy (%)	Accuracy (%)
XOR (200 × 2)	98.00	99.00	99.50
Complex XOR (260 × 2)	91.54	97.31	98.08

Table 2.

Classification accuracy of TWSVM, PTSVM, and LPPTSVM on Two-moons datasets.

Dataset	TWSVM	PTSVM	LPPTSVM
Dataset	Accuracy ± Std(%)	Accuracy ± Std(%)	Accuracy ± Std(%)
Two-moons-1 (100 × 2)	97.50 ± 3.07	97.67 ± 2.96	99.00 ± 2.11
Two-moons-2 (100 × 2)	95.33 ± 3.02	95.67 ± 3.26	97.67 ± 1.41

UCI datasets

To further compare our proposed LPPTSVM with MVSVM, TWSVM, and PTSVM, we choose 11 datasets from the UCI machine learning repository.²⁸ The numerical results of their linear version are given in Table 3. In Table 3, the best accuracy is shown in boldface. We can find that the accuracy of our proposed linear LPPTSVM is better than that of MVSVM, TWSVM, and PTSVM on most of the datasets. For example, for the House-Votes dataset, the accuracy of our LPPTSVM is 96.09%, while MVSVM is 94.25%, TWSVM is 94.94%, and PTSVM is 95.17%. Table 4 displays the experimental results for nonlinear MVSVM, TWSVM, PTSVM, and our proposed LPPTSVM on the above 11 UCI datasets. The Gaussian kernel

K (x, y) = e - \frac{‖ x - y ‖ 2}{2 σ 2}

is used. All the parameters are selected exactly the same as the above section. The results in Table 4 are similar to those in Table 3, and it confirms the observation above. Especially for the Heart–Statlog dataset, the proposed nonlinear LPPTSVM obtains the recognition rate of 85.93%, which is 3.7% higher than MVSVM, 1.8% higher than TWSVM, and 2.2% higher than PTSVM.

Table 3.

Test results of linear MVSVM, TWSVM, PTSVM, and LPPTSVM.

	MVSVM	TWSVM	PTSVM	LPPTSVM
Datasets	Acc + Std (%) Time (s)	Acc + Std (%) Time (s)	Acc + Std (%) Time (s)	Acc + Std (%) Time (s)
Australian 690 × 14	86.09 ± 0.94 T = 0.0555	86.96 ± 1.15 T = 0.0107	86.81 ± 2.96 T = 0.1361	87.25 ± 2.79 T = 0.2281
House-Votes 435 × 16	94.25 ± 2.15 T = 3.9797e-04	94.94 ± 2.38 T = 0.0056	95.17 ± 1.50 T = 0.0049	96.09 ± 3.11 T = 0.0821
Heart-c 303 × 13	81.84 ± 6.66 T = 4.6794e-04	84.48 ± 3.86 T = 0.01	84.50 ± 4.22 T = 0.0162	84.83 ± 2.08 T = 0.0410
Heart-Statlog 270 × 13	82.22 ± 3.84 T = 4.8478e-04	84.07 ± 7.92 T = 0.0083	83.70 ± 4.79 T = 0.0079	85.93 ± 4.26 T = 0.0327
Monk2 432 × 7	66.67 ± 3.09 T = 4.9528e-04	67.59 ± 2.33 T = 0.0073	73.15 ± 4.19 T = 0.0050	72.47 ± 3.65 T = 0.0812
Monk3 432 × 7	84.95 ± 3.77 T = 3.8052e-04	88.43 ± 3.73 T = 0.0402	85.88 ± 6.53 T = 0.0465	85.44 ± 4.99 T = 0.0921
Musk 476 × 166	77.94 ± 3.18 T = 0.0255	82.98 ± 1.74 T = 0.1442	81.70 ± 6.84 T = 0.0912	81.30 ± 4.68 T = 0.1709
PimaIndian 768 × 8	73.57 ± 2.71 T = 6.2720e-04	77.60 ± 2.68 T = 0.1909	76.03 ± 4.13 T = 0.1147	76.83 ± 4.66 T = 0.2626
Sonar 208 × 60	76.90 ± 5.64 T = 0.0015	75.51 ± 6.72 T = 0.0620	79.33 ± 4.96 T = 0.0639	77.92 ± 5.56 T = 0.0228
Spect 267 × 44	79.05 ± 6.00 T = 0.0017	80.12 ± 4.48 T = 0.0448	80.16 ± 5.97 T = 0.0187	81.68 ± 5.73 T = 0.0519
Wpbc 198 × 34	75.81 ± 9.82 T = 0.0013	81.37 ± 5.88 T = 0.1305	79.81 ± 4.27 T = 0.1408	80.38 ± 8.23 T = 0.0234

Table 4.

Test results of nonlinear MVSVM, TWSVM, PTSVM, and LPPTSVM.

Datasets	MVSVM	TWSVM	PTSVM	LPPTSVM
Datasets	Acc + Std (%) Time (s)	Acc + Std (%) Time (s)	Acc + Std (%) Time (s)	Acc + Std (%) Time (s)
Australian 690 × 14	86.52 ± 2.44 T = 0.4716	87.25 ± 1.59 T = 0.5334	86.81 ± 2.68 T = 0.4614	88.26 ± 3.30 T = 0.2803
House-Votes 435 × 16	94.02 ± 4.33 T = 0.1962	95.86 ± 1.31 T = 0.2104	95.40 ± 1.82 T = 0.2758	95.40 ± 1.82 T = 0.1732
Heart-c 303 × 13	79.55 ± 3.30 T = 0.1584	82.85 ± 3.71 T = 0.1190	83.13 ± 6.01 T = 0.1122	85.15 ± 3.04 T = 0.0732
Heart-Statlog 270 × 13	81.85 ± 4.01 T = 0.0650	82.96 ± 6.33 T = 0.0791	81.48 ± 2.62 T = 0.0705	85.56 ± 3.31 T = 0.0461
Monk2 432 × 7	80.77 ± 5.64 T = 0.3460	96.29 ± 3.00 T = 0.2379	96.75 ± 2.24 T = 0.2398	96.99 ± 1.75 T = 0.1601
Monk3 432 × 7	88.42 ± 4.91 T = 0.2937	99.07 ± 2.08 T = 0.1743	99.54 ± 1.03 T = 0.2091	99.77 ± 0.51 T = 0.1485
Musk 476 × 166	79.00 ± 4.45 T = 0.5303	93.49 ± 2.48 T = 0.3781	94.11 ± 3.69 T = 0.3992	89.08 ± 3.81 T = 0.2307
PimaIndian 768 × 8	70.95 ± 5.43 T = 1.7261	78.78 ± 3.10 T = 0.5962	76.69 ± 3.97 T = 0.7359	77.74 ± 2.48 T = 0.3564
Sonar 208 × 60	75.81 ± 6.85 T = 0.0766	87.47 ± 5.28 T = 0.0508	88.94 ± 7.64 T = 0.0535	87.04 ± 6.17 T = 0.0345
Spect 267 × 44	81.67 ± 6.69 T = 0.1209	80.52 ± 5.42 T = 0.0801	84.29 ± 7.12 T = 0.1885	83.87 ± 4.43 T = 0.0800
Wpbc 198 × 34	76.34 ± 8.38 T = 0.0681	81.33 ± 6.76 T = 0.0502	80.82 ± 4.88 T = 0.1065	80.81 ± 3.40 T = 0.0353

Conclusions

In this paper, a novel locally preserving projection twin support vector machine is proposed by combined with LPP and PTSVM. This method not only inherits the advantages of TWSVM and PTSVM, but also fully considers the local geometrical structure information between samples. Experimental results obtained on artificial datasets and real-world UCI datasets illustrate the effectiveness of the proposed LPPTSVM. It should be pointed out that there are many parameters in our LPPTSVM, so the parameter selection is a practical problem and should be investigated in the future.

Footnotes

Acknowledgment

The authors would like to thank Dr Yuan-Hai Shao from Zhejiang University of Technology for his valuable discussion and help.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under grant nos 61373055 and 61103128, and the Youth Fund Project of Anqing normal university under grant no. KJ201308.

References

Cortes

Vapnik

. Support vector machine. Machine Learn 1995; 20: 273–297.

Vapnik

. The nature of statistical learning theory, New York: Springer-Verlag, 2000.

Osuna E, Freund R and Girosi F. Training support vector machines: An application to face detection. In: Proceedings of computer vision and pattern recognition, San Juan, 17–19 Jun 1997, pp.130–136.

Isa

Lee

Kallimani

. Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Transact Knowledge Data Eng 2008; 20: 1264–1272.

Noble WS. Support vector machine applications in computational biology. In: Schölkopf B, Tsuda K and Vert JP (eds) Kernel methods in computational biology. Cambridge: MIT, 2004, pp.71–92.

Zafeiriou

Tefas

Pitas

. Minimum class variance support vector machine. IEEE Transact Image Process 2007; 16: 2551–2564.

Jayadeva Khemchandai

Chandra

. Twin support vector machine classification for pattern classification. IEEE Transact Pattern Anal Mach Intel 2007; 29: 905–910.

Chen

Yang

. Recursive projection twin support vector machine via within-class variance minimization. Pattern Recogn 2011; 44: 2643–2655.

Mangasarian

Wild

. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Transact Pattern Anal Mach Intel 2006; 28: 69–74.

10.

Fung G and Mangasarian OL. Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 26–29 August 2001, pp.77–86.

11.

Zhao

. Multi-weight vector projection support vector machines. Pattern Recogn Lett 2010; 31: 2006–2011.

12.

Shao

Zhang

Wang

. Improvements on twin support vector machines. IEEE Transact Neural Network 2011; 22: 962–968.

13.

Peng

. TPMSVM: A novel twin parametric-margin support vector machine for pattern recognition. Pattern Recogn 2011; 44: 2678–2692.

14.

Tian

Shi

. Robust twin support vector machine for pattern classification. Pattern Recogn 2013; 46: 305–316.

15.

Shao

Wang

Chen

. A regularization for the projection twin support vector machine. Knowledge-Based Syst 2013; 37: 203–210.

16.

Ding

Hua

. Recursive least squares projection twin support vector machines for nonlinear classification. Neurocomput 2014; 130: 3–9.

17.

Shao

Chen

Deng

. Nonparallel hyperplane support vector machine for binary classification problems. Inform Sci 2014; 263: 22–35.

18.

Tenenbaum

Silva

Langford

. A global geometric framework for nonlinear dimensionality reduction. Science 2000; 290: 2319–2323.

19.

Roweis

Saul

. Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290: 2323–2326.

20.

Benkin

Niyogi

. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computat 2003; 15: 1373–1396.

21.

He XF and Niyogi P. Locality preserving projections. In: Proceedings of the conference on advances in neural information processing systems, Whistler, British Columbia, Canada, 9–11 December 2003.

22.

Cai D, He XF and Han JW. Semi-supervised discriminant analysis. In: Proceedings of 11th international conference on computer vision, ICCV 2007, Rio de Janeiro, Brazil, 14–21 October 2007, pp.1–7.

23.

Benkin

Niyogi

Sindhwani

. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 2006; 7: 2399–2434.

24.

Wang

Chung

Wang

. On minimum class locality preserving variance support vector machine. Pattern Recogn 2010; 43: 2753–2762.

25.

Mangasarian OL. Nonlinear programming. USA: Society for Industrial and Applied Mathematics (SIAM), 1994.

26.

Xiong

Swany

MNS

Ahmad

. Optimizing the kernel in the empirical feature space. IEEE Transact Neural Network 2005; 16: 460–474.

27.

Mangasarian

Musicant

. Successive overrelaxation for support vector machines. IEEE Transact Neural Network 1999; 10: 1032–1037.

28.

Muphy PM and Aha DW. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA, 1992.