Sage Journals: Discover world-class research

Abstract

The research of localization technology based on received signal strength and machine learning has recently attracted a lot of attentions, since with the help of enough labeled training data this technology is able to achieve high positioning accuracy. However, it is an expensive job to collect enough labeled training data in the broad outdoor space. In order to reduce the cost of building and maintaining training database, semi-supervised extreme learning machine is applied to solve the cellular network localization in this article. However, the performance of this algorithm is sensitive to the values of the hyper parameters. Without any systematic guidance, the optimal hyper parameters can only be selected by experienced workers through trial and error. To address this problem, we propose a novel algorithm by combining particle swarm optimization and semi-supervised extreme learning machine to automatically select the optimal hyper parameters of semi-supervised extreme learning machine in this article. The experiments demonstrate that applying particle swarm optimization in our optimization framework makes the hyper parameters of semi-supervised extreme learning machine algorithm self-adaptive in different conditions. Moreover, the proposed method is more stable than the general semi-supervised extreme learning machine and outperforms other compared methods.

Keywords

Cellular network localization semi-supervised extreme learning machine particle swarm optimization regularization

Introduction

The highly developed cellular network with worldwide range signal and the popularity of the smartphone make the localization technology based on cellular network become an important outdoor localization technology. When the Global Positioning System (GPS) is unavailable, the smartphone has no choice but relying on the cellular network to obtain the location information. Meanwhile, acquiring the accurate location information is a prerequisite of many Internet of Things (IoT) applications, whose development makes more and more devices be connected to cellular network. Therefore, the situation presented above motivates us to improve the positioning accuracy in the environment of cellular network.

Traditional localization technologies, such as time of arrival (TOA), time difference of arrival (TDOA), and angle of arrival (AOA), are geometrical distance based. Compared with these technologies, the technology based on received signal strength (RSS) and machine learning is more suitable for the non-line-of-sight (NLOS) environment.¹ It is a challenging task indeed because the labeled RSS data with location information for training are artificially collected by moving devices in target area. But the collecting process in outdoor space is time-consuming. In contrast to the acquisition of labeled data, the unlabeled RSS data without location information is easier to be collected. We can extract RSS data from the measurement report (MR) uploaded by mobile at the target area.

Since the existing semi-supervised learning algorithms can make use of the unlabeled data to reduce the demand of labeled data, some semi-supervised learning algorithms have been proposed to solve the indoor localization for WIFI network, such as the label propagation algorithm (LP algorithm), which is applied in Liu et al.² and Sadiq and Valaee.³ However, LP algorithm is inefficient, since in online positioning phase, the LP algorithm estimates the location by labeling the new coming RSS through the LP process, involving the whole dataset. Lin et al.⁴ applied spectral decomposition of Laplacian matrix to labeling the unlabeled data through aligning the labeled data in the eigenvectors space. Then, they used the entire labeled dataset to train the supervised learning algorithm for localization.

In order to address the ill-posed problem, Tikhonov⁵ put forward a regularization method. The basic idea of this method is to introduce regularization function with the prior knowledge about the solution. So that we can obtain a stable solution by solving a well-posed problem, which is proximate problem of the original ill problem. In essence, many machine learning problems are ill-posed problems. From 1990s, the development trend of classical regularization technique is making the solution of function $f$ be smooth and stable in the reproducing kernel Hilbert space (RKHS). For example, the support vector machine was built on the structural risk minimization (SRM) principle. However, the basic Tikhonov regularization method is mainly focused on the smoothness of function, it does not concern about the inner structure of the manifold, where the samples come from.

To overcome the ill-posedness of semi-supervised learning problems, Belkin and Niyogi^6–8 extended the basic Tikhonov regularization framework by introducing manifold regularizer and called it manifold regularization framework. Through the manifold regularizer, this framework approximating the manifold structure by the graph is built from the labeled and unlabeled data. The famous semi-supervised learning algorithms like Laplacian support vector machine (Lap-SVM)^6,7,9 and the semi-supervised extreme learning machine (SS-ELM)¹⁰ are built on this framework.

Liu et al.¹¹ have proposed an algorithm called SELM for indoor localization based on the theory of extreme learning machine (ELM)^12,13 and manifold regularization,^6,7 which inherits the property of ELM, so that it can utilize the unlabeled data to improve the localization accuracy with fast training speed. However, the SELM does not control the complexity of the classifier. Another improved version algorithm, which is called SS-ELM, is proposed by Huang et al.¹⁰ based on manifold regularization. Different from SELM, the Tikhonov regularization constraint is added to the object function of SS-ELM, thus the complexity of the classifier can be controlled. Moreover, SS-ELM can weigh the labeled samples in order to handle the imbalanced classification problem. Iosifidis et al.¹⁴ put forward a discriminant extreme learning machine (DELM) from the discriminant analysis perspective. Based on DELM, they proposed semi-supervised discriminant extreme learning machine (SDELM) in the same paper.

Although ELM is much faster than the traditional gradient-based learning neural network, the random determination of the input weights and hidden biases makes the ELM need more hidden units, which may result in ill-posed.¹⁵ To obtain a more compact network with better generalization ability, most evolutionary-based ELM research is focused on optimizing the input weights and hidden biases, such as E-ELM, which is proposed by Zhu et al.¹⁶ In the framework of E-ELM, the input weights and hidden bias of ELM are optimized by differential evolution (DE), according to the root mean square error (RMSE) and norm of output weight. Later, Cao et al.¹⁷ put forward an improved version of E-ELM named Sa-ELM by replacing the standard DE with self-adaptive DE, whose trial vector generation strategies and their associated control parameters are self-adapted. In the same way, PSO algorithm has been applied to the optimization of ELM such as PSO-ELM in Xu and Shu¹⁵ and IPSO-ELM in Han et al.¹⁸

Through the experiment, we find out that the lack of labeled training data makes the ill-posedness prominent in semi-supervised learning. Thus, the prior knowledge of regularizer is more important than the input weights and hidden bias selection. That is why the performance of SS-ELM is sensitive to the values of the hyper parameters in the case of lack of labeled data. Thus, the hyper parameters of SS-ELM are varied in different situation. The number of the labeled training data usually leads to different optimal hyper parameters. Without any systematic guidance, the optimized hyper parameters can be only determined by experienced worker through trial and error.

In this article, we treat this parameter selection problem as a parameter optimization problem. We are going to present the hybrid of particle swarm and semi-supervised extreme learning machine, for convenience we call it PSO-SSELM. The practice has proved that the stochastic search algorithm particle swarm optimization (PSO) is an efficiency algorithm that can reach the global minima.¹⁹ Compared to genetic algorithm, the PSO not only has no complicated evolution but also has less parameters to adjust.²⁰ Based on the above context, the contributions of this work are as follows:

By implementing the SS-ELM, we reduce the demand of labeled training data without lowering the localization precision level in the cellular network environment.

In the proposed PSO-SSELM algorithm, we use the PSO to optimize the hyper parameters of SS-ELM, so we obtain a hyper parameter self-adapting SS-ELM with optimal performance in different condition.

Considering the lack of labeled information in semi-supervised learning problem, PSO optimizes the hyper parameters according to both the loss on labeled data and the number of extreme values on the whole dataset (including labeled and unlabeled data) in the framework. This important feature of PSO-SSELM makes the SS-ELM obtain better performance.

Preliminaries

PSO

PSO is a population-based stochastic optimization technique, which is developed by Eberhart and Kennedy.^21,22 The concept of particle swarm is inspired by social behavior of bird flocking or fish schooling. The PSO algorithm can be described as an automatically evolving system.

PSO works by randomly generating a batch of particles over the given search space and these particles move with a certain velocity to find the global optimal position after some iterations. In each iteration, every particle updates its velocity according to its momentum, the contribution of its best position $(P_{b})$ in searching history as well as the global best position $(P_{g})$ in searching history. Then, the particles move a new position according to the their current velocity. If the dimension of searching space is D, the total number of particles is n, the position of the ith particle can be denoted as vector $X_{i} = (x_{i 1}, x_{i 2}, \dots, x_{iD})$ ; the best position of the ith particle searching until now is denoted as $P_{ib} = (p_{i 1}, p_{i 2}, \dots, p_{iD})$ ; and the global best position is denoted as $P_{g} = (p_{g 1}, p_{g 2}, \dots, p_{gD})$ ; the velocity is represented as $V_{i} = (v_{i 1}, v_{i 2}, \dots, v_{iD})$ . Based on the above context, the original PSO can be described by the following functions:

Velocity update

\begin{array}{l} v_{i d} (t + 1) = v_{i d} (t) + c_{1} * r a n d () * [p_{i d} (t) - x_{i d} (t)] \\ + c_{2} * r a n d () * [p_{g d} (t) - x_{i d} (t)] \end{array}

(1)

{\begin{matrix} v_{id} (t + 1) = v_{\max} & if v_{id} (t + 1) > v_{\max} \\ v_{id} (t + 1) = v_{\min} & if v_{id} (t + 1) < v_{\min} \end{matrix}

(2)

where the positive constant $c_{1}, c_{2}$ are acceleration factors, $rand ()$ is a random number between 0 and 1, $v_{\max}$ and $v_{\min}$ are the upper bound and lower bound of particle velocity, and $t$ is the iteration generation.

Particle position update

x_{id} (t + 1) = x_{id} (t) + v_{id} (t + 1) 1 \leq i \leq n, 1 \leq d \leq D

(3)

Optimal position update

P_{ib} = {\begin{matrix} X_{i} & if f (X_{i}) < f (P_{ib}) \\ P_{ib} & else \end{matrix}

(4)

P_{g} = {\begin{matrix} X_{i} & if f (X_{i}) < f (P_{g}) \\ P_{g} & else \end{matrix}

(5)

Another improved version of adaptive particle swarm optimization (APSO) is put forward by Shi and Eberhart.²³ This algorithm can be described by the following functions

\begin{array}{l} v_{i d} (t + 1) = w (t) * v_{i d} (t) + c_{1} * r a n d () * \\ [p_{i d} (t) - x_{i d} (t)] + c_{2} * r a n d () * [p_{g d} (t) - X_{i d}] \end{array}

(6)

where $w (t)$ is the inertial weight, which will gradually reduce as the generation increases according to function (7)

w (t) = w_{\max} - \frac{t (w_{\max} - w_{\min})}{itermax}

(7)

where $w_{\max}$ and $w_{\min}$ are the initial maximum weight and minimum weight, respectively. Compared to PSO, the searching efficiency of APSO is much better than PSO. However, when the iteration generation increases, the searching space shrinks. It will be easier for APSO to be trapped by local minimum.

SS-ELM

ELM belongs to the family of single hidden layer feed forward networks (SLFN). However, unlike traditional thinking that all the parameters in the feed forward networks need to be tuned, the training process of ELM only adjusts the output weight.^12,13 For a given $N$ samples training set ${(X_{i}, T_{i}) | i = 1, 2, 3, \dots, N}$ , where $X_{i} \in R^{n}$ is the input space and $T_{i} \in R^{m}$ is the output space. The key idea of ELM is finding a mapping from the input space to the output space with minimum error. Given the number of hidden units is $\tilde{N}$ , the input weights vector $w$ , the hidden bias vector $b$ , and the activation function $g (x)$ , the ELM model can be expressed by

H = {[\begin{matrix} g (w_{1} \times x_{1} + b_{1}) & \dots & g (w_{\tilde{N}} \times x_{1} + b_{\tilde{N}}) \\ ⋮ & ⋱ & ⋮ \\ g (w_{1} \times x_{N} + b_{1}) & \dots & g (w_{\tilde{N}} \times x_{N} + b_{\tilde{N}}) \end{matrix}]}_{N \times \tilde{N}}

(8)

T = {[\begin{matrix} t_{1}^{T} \\ ⋮ \\ t_{N}^{T} \end{matrix}]}_{N \times m}, β = {[\begin{matrix} β_{1}^{T} \\ ⋮ \\ β_{\tilde{N}}^{T} \end{matrix}]}_{\tilde{N} \times m}

(9)

H β = T

(10)

where $H$ is hidden matrix, $β$ is the output weight, and $T$ is the label matrix.

Since we do not need to adjust the input weights $w$ and the hidden bias $b$ , the training of ELM can be regarded as solving the linear system of $H β = T$ , where $H$ and $T$ are known. To obtain a unique and global optimal solution, Huang et al.^12,13 solved this system using Moore–Penrose inverse

\hat{β} = H^{†} T

(11)

where † is the Moore–Penrose inverse of matrix. Although the Moore–Penrose inverse guarantees the existence and uniqueness of solution, it cannot guarantee the stability of the solution because the unavoidable noise data and undersampling cause the difference between observation value $S_{σ}$ and real value $S$ in many situations.

Therefore, in order to use the unlabeled data to improve the stability of ELM in the case of lack of labeled training data, Huang et al.¹⁰ put forward a semi-supervised learning extreme learning machine (SS-ELM) using the manifold framework

\begin{array}{l} \underset{β \in R^{\tilde{N} \times m}}{\arg \min L_{S S - E L M}} = \frac{1}{2} {‖ C_{e}^{\frac{1}{2}} (F - T) ‖}^{2} \\ + \frac{c_{β}}{2} {‖ β ‖}^{2} + \frac{λ}{2} T r (F^{T} L F) \end{array}

(12)

s . t . F = H β

where the first term in the object function is the empirical loss on the labeled training data, and the second term and third term are Tikhonov regularizer and manifold regularizer, and $c_{β}$ and $λ$ are hyper parameters. $C_{e}$ is $(l + u) \times (l + u)$ diagonal matrix with elements $[C_{e}]_{ii} = C_{i}, i = 1, 2, \dots, l$ , and the reset elements equal to 0. $l$ and $u$ are the number of labeled training data and unlabeled training data. $C_{i}, i = 1, 2, \dots, l$ are different penalty, which are associated with prediction errors respect to different classes in case of the unbalanced data distribution. For the regression task, we set $C_{i}$ as 1. In the last term of function (12), $Tr (\cdot)$ is the trace of the matrix, and the graph Laplacian matrix L is calculated by

L = D - W

(13)

where $D$ is a diagonal matrix given by $D_{ii} = \sum_{j = 1}^{l + u} w_{ij}$ and $W$ is a edge weight matrix, which is usually processed with the Gaussian kernel: $[w_{ij}] = e^{- \frac{| x_{i} - x_{j} |^{2}}{2 σ^{2}}}$ , representing the similarity between samples $x_{i}$ and $x_{j}$ .

To obtain the optimal solution of $β$ , we need to compute the gradient of the objective function (12) with respect to $β$

\nabla L_{SS - ELM} = H^{T} C_{e} (H β - T) + c_{β} β + λ H^{T} LH β

(14)

By setting the function to 0, we get the optimal solution

β^{*} = (c_{β} I_{\tilde{N} \times \tilde{N}} + H^{T} C_{e} H + λ H^{T} LH)^{- 1} H^{T} C_{e} T

(15)

If the number of the labeled training data is less than the number of hidden units in SS-ELM, which means the rows are more than the columns in matrix $H$ , then the solution is obtained by

β^{*} = H^{T} (c_{β} I_{(l + u) \times (l + u)} + C_{e} H H^{T} + λ LH H^{T})^{- 1} C_{e} T

(16)

where $I_{\tilde{N} \times \tilde{N}}$ and $I_{(l + u) \times (l + u)}$ are identity matrix with dimensions $\tilde{N}$ and $l + u$ .

Hybrid PSO and SS-ELM for localization

The hyper parameters $c_{β}$ and $λ$ reflect the importance of prior knowledge that contributed by the Tikhonov regularizer and manifold regularizer. In the manifold regularization framework, the Tikhonov regularizer and manifold regularizer control the smoothness in ambient and intrinsic spaces. The reasons why we use the Tikhonov regularizer and manifold regularizer jointly are as follows:⁶

We do not usually have access to the true marginal distribution, just to data points sampled from it. Therefore, regularization with respect only to the sampled manifold will lead to an ill classifier.

There are some situations that the manifold assumption holds in less degree, thus smaller impact of manifold regularizer in the objective function of classifier will produce a better solution.

That means being able to trade off these two regularizers are important in practice. However, there is not any theoretical instruction guiding us to select optimal hyper parameters, so that we can trade off the manifold regularizer and Tikhonov regularizer. Since it is difficult for us to summarize the complex parameter adjustment method into a theory, we solve this problem from the parameter optimization perspective.

Before we can optimize the hyper parameters $c_{β}$ and $λ$ using PSO, we need to solve other two problems: coding the hyper parameters $c_{β}$ and $λ$ and giving a reasonable fitness function $f (\cdot)$ .

For the first problem, we use the real number encoding method by equations: $c_{β} = 10^{x_{1}}$ , $λ = 10^{x_{2}}$ , where $x_{1}, x_{2} \in [- 10, 0]$ . Then, $c_{β}$ and $λ$ can be represented as particles: $X_{i} = (x_{i 1}, x_{i 2})$ , where $i$ is the index of the particle.

As we mentioned in the second section, the fitness function $f (\cdot)$ is the particle evaluation criteria. Considering the fact that we apply the SS-ELM with only few labeled training data, if the fitness is only based on the prediction error on the limited labeled data, it will lead to an ill-posed SS-ELM model. Through experiment we notice that without proper hyper parameters SS-ELM likely produces extreme values in the case of lack of labeled data. Although it is not possible for us to measure the label prediction error on the whole training dataset (including labeled data and unlabeled data), counting the number of extreme values is a feasible way to obtain more comprehensive information.

Based on the analysis above we can solve the second problem, our fitness function based on both the label prediction error and the number of extreme values. In fact by introducing the number of extreme values into the fitness function, PSO is able to utilize the unlabeled data to select proper hyper parameters. So that we can broaden the criteria information, which PSO use to select proper hyper parameters. Therefore, for regression model, the fitness function is expressed as

f (\cdot) = \frac{\sum_{j = 1}^{l} \sqrt{\sum_{k = 1}^{d} ∥ {\hat{T}}_{jk} - T_{jk} ∥}}{l} \times \frac{l}{u + l} + R \times \frac{u}{u + l}

(17)

where $T_{jk}$ is the label of labeled data, and ${\hat{T}}_{jk}$ is the prediction on the labeled data, thus the first term of the function (17) is the label prediction error. $u$ and $l$ are the number of unlabeled training data and labeled training data, and $R$ is the ratio of the extreme values to the whole training dataset. For the classification model, we give the fitness function as

f (\cdot) = F_{1} \times \frac{l}{u + l} + R \times \frac{u}{u + l}

(18)

where $F_{1}$ is the F1 score on the labeled training data. And F1 score measurement is given by

F_{1} = \frac{2 \times P \times R}{P + R} = \frac{2 \times TP}{l + TP - TN}

(19)

where $P$ and $R$ are the precision and recall, $TP$ is the true positive, and $TN$ is true negative. And the computing method of extreme values ratio $R$ is given as follows.

If we denote the training output of SS-ELM as $\hat{o} utput = {{\hat{O}}_{1}, {\hat{O}}_{2}, \dots, {\hat{O}}_{i}, \dots, {\hat{O}}_{l + u}}$ , where ${\hat{O}}_{i} = {{\hat{o}}_{1 i}, {\hat{o}}_{2 i}, \dots, {\hat{o}}_{ji}, \dots, {\hat{o}}_{ni}}$ is the output vector in the output layer of SS-ELM. Then, the extreme value can be defined as: for $\forall {\hat{O}}_{i} \in \hat{o} utput$ if $\exists {\hat{o}}_{ji} \notin [lowe r_{j}, uppe r_{j}]$ then the output ${\hat{O}}_{i}$ from SS-ELM is a extreme value. If the number of extreme values is $κ$ , then the ratio number of extreme values is

R = \frac{κ}{l + u}

(20)

According to fitness functions (17) and (18), the contribution of RMSE and F1 of labeled data will increase along with the increase in the proportion of labeled training data. That means when we have enough labeled data, PSO will mainly depend on the labeled information in the optimization phase. Because compared with the unlabeled data, the information from labeled data is more accurate. Somewhat this strategy makes the model self-adaptive.

Finally, we summarize the procedure of our proposed method as follows:

Step 1. Randomly initialize the input weighs $w$ and hidden bias $b$ of SS-ELM and compute the hidden matrix $H$ according to function (8). Meanwhile build Laplacian matrix $L$ based on the labeled and unlabeled data.

Step 2. Randomly generate a batch of particles. Then, denote the initial position of particle with index $i$ as vector $X_{i} = (x_{i 1}, x_{i 2})$ , where $x \in [- 10, 0]$ and denote the velocity as $V_{i} = (v_{i 1}, v_{i 2})$ where $v_{i 1}, v_{i 2} \in [- 0.4, 0.4]$ .

Step 3. Compute the fitness for each particle according to the functions (17) or (18).

Step 4. Update the variable according to functions (1)–(5).

Step 5. If the global optimal fitness has met convergence condition, then stop the iterations and obtain the optimal hyper parameters, otherwise continue the iterations from Step 3.

Experiment

Experimental setup

To our knowledge, there is not any appropriate public cellular network benchmark dataset for the RSS-based localization research. So the simulated data are used in our experiments. In order to obtain the RSS data, the radio propagation environment in urban area is simulated based on the real three-dimensional (3D) map and the Base Station distribution of city Guangzhou by Volcano URBAN model (Volcano propagation model: www.siradel.com/software/connectivity/volcano-software/), which is one of the 3D tray tracing propagation models²⁴ that is integrated in a commercial radio planning software Atoll version 3.1. Figure 1 gives the rendering of this simulation. The simulation resolution is set as 5 m in Atoll, therefore we can obtain a sample for each 5 m.

Figure 1.

Simulation output based on the Volcano model.

Table 1 shows the format of RSS sample, where the first 35 columns are the RSS-simulated values for each base stations and the last 2 columns represent the real-world coordinate, the longitude and latitude coordinates (in meters with UTM from WGS84). To evaluate the performance of the algorithm, the localization area should be defined, in this article we consider a localization area of $300 m \times 600 m$ with the more than 7000 samples in total.

Table 1.

The format of simulated data.

BST₁	…	BST₂₆	BST₂₇	BST₂₈	BST₂₉	…	BST₃₅	Latitude	Longitude
0.0 (dB)	…	−47.313 (dB)	−49.625 (dB)	−50 (dB)	48.688 (dB)	…	0.0 (dB)	736,555 (m)	2,556,250 (m)

Performance measurement

The performance measurements are different according to the ways we estimate the location of terminals:

1. Location region recognition. This is usually applied by dividing the localization area into several subregions, the locations of terminals are estimated by recognizing the subregions, where the terminals belong to. Usually, the classification algorithm is used to recognize the regions. Therefore, the True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) are employed to obtain performance metrics such as accuracy (AC), precision (P), and recall (R) for one experiment

AC = \frac{TP + TN}{TP + FP + TN + FN}

(21)

P = \frac{TP}{TP + FP}

(22)

R = \frac{TP}{TP + FN}

(23)

Through the evaluation for one single experiment, we can obtain macro metrics for multiple experiments

macro - P = \frac{1}{k} \sum_{i = 1}^{k} P_{i}

(24)

macro - R = \frac{1}{k} \sum_{i = 1}^{k} R_{i}

(25)

macro - AC = \frac{1}{k} \sum_{i = 1}^{k} A C_{i}

(26)

2. Coordinate prediction. Rather than recognizing subregions, we estimate the coordinate of terminal directly while applying this method. Usually, regression model is used to predict the coordinate of terminal. Therefore, we evaluate the performance of algorithms by measuring the error distance (ED) between the estimation location and the real location

ED = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(lo c_{i} - f_{i} (rs s_{i}))}^{2}}

(27)

To evaluate the stability of the algorithm we need the metrics $R$ , the number of extreme values, which we declared above. In order to evaluate multiple experiments, the average error distance $(\bar{E} D)$ , standard deviation (STD), and average number of the extreme values $(\bar{R})$ are obtained by

\bar{E} D = \sqrt{\frac{1}{k} \sum_{j = 1}^{k} E D_{j}}

(28)

STD = \sqrt{\frac{1}{k - 1} \sum_{j = 1}^{k} {(E D_{j} - \bar{E} D)}^{2}}

(29)

\bar{R} = \frac{1}{k} \sum_{i = 1}^{k} R_{i}

(30)

Since SS-ELM can be applied in classification and regression, in order to demonstrate the strengthness of the proposed method from different perspectives, we evaluate the proposed method in both ways. Thus, we also divide the $300 m \times 600 m$ target area into 50 subregions equally.

To evaluate the performances, the cross-validation technique is applied in the experiments. In a typical k-fold cross-validation, the samples are randomly split into K subsets equally. Then, the subsets are divided into two sets, the test set with only one subset and the training set with the reset (k − 1) subsets. The k-fold cross-validation process can be described as Algorithm 1.

Algorithm 1. Description of k-fold cross-validation process.
Step 1: Partition data into k equal sets $T_{1} \dots T_{k}$ Step 2:forj = 1 to k do $T_{i}$ for validation and the remaining k − 1 sets use for training. Compute the precision $(P_{i})$ , recall $(R_{i})$ , accuracy $(A C_{i})$ or error distance $(E D_{i})$ , and the number of extreme values $R_{i}$ Step 3: Compute the macro precision (macro-P), macro recall (macro-R), macro accuracy (macro-AC) or average of error distance $(\bar{E} D)$ and the standard deviation $(STD)$ , average number of extreme values $(\bar{R})$

Algorithm 1. Description of k-fold cross-validation process.

Step 1: Partition data into k equal sets

T_{1} \dots T_{k}

Step 2:forj = 1 to k do

T_{i}

for validation and the remaining k − 1 sets use for training. Compute the precision

(P_{i})

, recall

(R_{i})

, accuracy

(A C_{i})

or error distance

(E D_{i})

, and the number of extreme values

R_{i}

Step 3: Compute the macro precision (macro-P), macro recall (macro-R), macro accuracy (macro-AC) or average of error distance

(\bar{E} D)

and the standard deviation

(STD)

, average number of extreme values

(\bar{R})

Performance comparison

After the division of training set and testing set, we further divide the training data into two groups, one group with labels, the rest are unlabeled. Our experimental configurations are as follows: (1) operating system: Linux 3.19.0-25-generic, (2) CPU: Xeon E5-2640V2@ 2.0 GHZ, (3) memory: 32 GB, and (4) software: Python2.7 and MATLAB R2015b for Lap-SVM.

The experiments are run in two groups. In the first group of experiments, localization problem is treated as a regression task. The proposed PSO-SSELM is compared with ELM, SS-ELM, PSO-ELM, E-ELM, and support vector regression (SVR). Except for SVR, whose program is provided scikit-learn 0.18,²⁵ we developed the python program of all these algorithms according to the source code of ELM and SS-ELM from the homepage of its author (Source code of ELM and SS-ELM: www.ntu.edu.sg/home/egbhuang/elm_codes.html). The default parameters of SVR in scikit-learn 0.18 are used in our experiments. The number of hidden units in all the neural networks are 150. The hyper parameters of SS-ELM are set as constant values: $c_{β} = 10^{- 4} λ = 10^{- 6}$ . There is no doubt that the fitness function of PSO-SSELM in this group is function (17).

Table 2 gives the experiments result of the first group. The experimental variable in this group is the number of labeled sample and the best performance for metrics in each experiment are the bold values in the table. As observed from this table, the semi-supervised learning algorithm (PSO-SSELM, SS-ELM) outperforms the supervised learning algorithm (ELM, PSO-ELM, E-ELM, SVR), when the number of labeled data is less than 50. When the number of labeled data is more than 50, they reach the same level. Notice that the proposed PSO-SSELM gets the lowest error distance in five of six experiments. Moreover, in these six experiments, the SVR and the proposed PSO-SSELM achieve the much fewer extreme values than other compared algorithms. On the contrary, E-ELM and PSO-ELM produce more extreme values than other compared algorithms in these six experiments, because in both E-ELM and PSO-ELM, the input weights and hidden bias are optimized only according to the limited labeled data. However, by introducing the number of extreme value into fitness function in training, PSO is able to utilize both labeled data and unlabeled data to select optimal hyper parameters, which are more efficient in improving the stability of SS-ELM. Through this mechanism, the hyper parameters of SS-ELM are self-adaptive as what we demonstrate in Table 3. That is why the proposed PSO-SSELM can obtain optimal SS-ELM with fewer extreme values.

Table 2.

The performance comparison of coordinate prediction.

Number of labeled samples	Algorithms	ED (m)	STD (m)	Number of extreme values
10	PSO-SSELM	90.54	2.86	0
	SS-ELM	112.79	10.99	32.6
	PSO-ELM	119.38	27.09	56.6
	E-ELM	120.11	23.73	57.1
	ELM	123.77	14.99	58.3
	SVR	116.79	10.68	0
30	PSO-SSELM	84.41	5.46	0.2
	SS-ELM	92.71	3.95	21.5
	PSO-ELM	104.68	6.37	38.7
	E-ELM	103.77	5.31	40.9
	ELM	102.09	4.40	27.1
	SVR	90.87	1.74	0
50	PSO-SSELM	77.43	3.68	0.4
	SS-ELM	82.49	3.03	7.3
	PSO-SSELM	82.68	3.78	7.2
	E-ELM	83.10	2.79	8.1
	ELM	81.84	2.75	6.6
	SVR	81.92	1.50	0
70	PSO-SSELM	72.90	4.10	0.2
	SS-ELM	75.61	3.16	8.2
	PSO-ELM	76.33	3.16	9.2
	E-ELM	74.12	3.32	8.1
	ELM	75.81	1.56	8.3
	SVR	78.86	2.92	0
90	PSO-SSELM	71.07	2.50	0.3
	SS-ELM	72.02	2.80	10
	PSO-ELM	72.40	2.67	9.5
	E-ELM	72.31	2.82	10.1
	ELM	71.71	2.68	7.3
	SVR	77.73	2.90	0
110	PSO-SSELM	71.71	2.23	0
	SS-ELM	70.50	3.10	5.8
	PSO-ELM	70.77	2.92	6.2
	E-ELM	70.12	2.89	5.9
	ELM	69.80	2.06	5.0
	SVR	74.63	2.49	0

Table 3.

The optimal hyper parameters of PSO-SSELM for each experiment.

Number of labeled samples	$c_{β}$	$λ$
10	0.0517552494024	$1.49546572208 \times 10^{- 7}$
30	0.238364637891	$1.12271620374 \times 10^{- 7}$
50	0.218215340234	$1.05474933206 \times 10^{- 7}$
70	0.229264553497	$6.91432149339 \times 10^{- 5}$
90	0.00271146597849	$1.57889993735 \times 10^{- 4}$
110	0.00131098351761	$2.54316244684 \times 10^{- 4}$

In the second group of experiments, the localization problem is treated as a classification task. In this group of experiments, the compared methods are several semi-supervised classification algorithms, including LP,²⁵ Lap-SVM,⁹ SS-ELM, and SDELM. The default parameters in the library are used for the LP and Lap-SVM. The parameters of SS-ELM and PSO-SSELM are the same with what we declared in the first group of experiments. The fitness function of PSO-SSELM is function (18).

Table 4 reports the macro precision (%), macro recall (%), and macro accuracy (%) with the best performance represented by the bold value for the 10-fold validation. Among these six experiments, the proposed method achieves best precision in four experiments, and best recall and best accuracy in three and four experiments, respectively. Based on this result we can infer that, the performance of the proposed algorithm is better than the other graph-based algorithms (SS-ELM, SDELM, Lap-SVM, LP) in the comparison experiments. The potential reason is that the hyper parameters of SS-ELM in the proposed algorithm are self-adaptive, which result in optimal performance of SS-ELM in difference conditions. While the other compared methods with non-adaptive parameters, they do not achieve their best performance in some certain condition. For example, LP may not converge to global optimal solution with the default parameters.

Table 4.

The performance comparison of regions recognition.

Number of labeled samples	Algorithms	Precision (%)	Recall (%)	Accuracy (%)
100	PSO-SSELM	14.67	17.51	18.68
	SS-ELM	10.59	18.22	17.93
	SDELM	8.57	15.33	15.67
	LP	6.39	13.50	12.20
	Lap-SVM	19.60	15.64	19.42
200	PSO-SSELM	24.10	27.31	29.66
	SS-ELM	17.49	21.55	22.66
	SDELM	23.45	24.96	25.33
	LP	13.98	21.46	23.79
	Lap-SVM	22.65	26.63	26.63
300	PSO-SSELM	31.10	30.14	30.84
	SS-ELM	22.36	25.95	27.10
	SDELM	30.68	30.53	31.25
	LP	15.67	23.76	26.32
	Lap-SVM	30.48	31.96	33.59
400	PSO-SSELM	40.08	40.98	39.46
	SS-ELM	23.17	26.30	28.16
	SDELM	31.55	32.83	33.72
	LP	16.75	24.86	27.85
	Lap-SVM	43.15	37.47	36.90
500	PSO-SSELM	45.10	41.21	41.69
	SS-ELM	25.57	27.67	29.23
	SDELM	31.15	31.42	32.10
	LP	18.32	25.95	28.69
	Lap-SVM	42.07	41.50	41.37
600	PSO-SSELM	47.04	44.43	43.89
	SS-ELM	26.33	27.88	30.25
	SDELM	36.67	35.04	35.49
	LP	19.00	27.45	30.96
	Lap-SVM	43.40	38.45	40.96

Through the two groups of experiment above, we can summarize that the proposed method makes the hyper parameters of SS-ELM self-adaptive, by integrating PSO and SS-ELM. Therefore, SS-ELM can achieve optimal performance under different conditions. By comparing the optimization of strategy proposed PSO-SSELM with the optimization strategies of E-ELM and PSO-ELM, we can infer that the most appropriate way to improve the performance of ELM family in the case of lack of labeled data is to give reasonable hyper parameters. Because in the training phase of extreme learning family, the output weights are determined by the inverse. The inverse is likely to produce the extreme values, which cause huge error. We solve this problem by utilizing both labeled and unlabeled data in the fitness function of PSO, so that the proposed method gains comprehensive information for optimal hyper parameters selection, which improves the stability and provides better accuracy for SS-ELM.

Analysis of fitness functions

In Figure 2(a) and (b), we plot the regression fitness in training according to function (17) and the RMSE of the SS-ELM in the testing, under the condition that the number of labeled training data is 70. In these two figures, $x_{1}$ and $x_{2}$ are, respectively, equal to $\log (c_{β})$ and $\log (λ)$ . As observed, Figure 2(a) has a similar overall trend with Figure 2(b). In both Figure 2(a) and (b), the optimal values are obtained in the region where $x_{1} \in [- 2.5, - 0.5]$ and $x_{2} \in [- 9.0, - 2.5]$ . For the region where $x_{1} \in [- 7.5, - 0.5]$ and $x_{2} \in [- 2.5, - 0.5]$ , it is an obvious RMSE descend in Figure 2(b), but in Figure 2(a) the fitness descend of this region is not obvious. This is because the risk of overfitting is high, when the labeled training data is only 70. In order to control the number of extreme values in this condition, the contribution of extreme values to fitness is higher than RMSE. Through this, we justify the effectiveness of the regression fitness function (17).

Figure 2.

(a) Regression fitness surface and (b) root mean square error surface.

In the same way, we plot the classification fitness according to function (18) in training and the average classification accuracy (%) of the SS-ELM in testing, under the condition that the number of labeled training data is 300. As observed from Figure 3(a), the z axis is the opposite number of the average classification accuracy. In Figure 3(a), the optimal values are obtained in the region where $x_{1} \in [- 10.0, - 5.0]$ and $x_{2} \in [- 6, - 4]$ . In Figure 3(b), the optimal values are in the region where $x_{1} \in [- 8.0, - 5.0]$ and $x_{2} \in [- 9, - 7]$ . Although the optimal regions of these two figures have no intersection, this is also reasonable. This is mainly because the risk of outputting the extreme values is high when $x_{2} \in [- 9.0, - 7.0]$ . So in our optimization strategy, the optimal fitness in Figure 3(a) is obtained by moving $x_{2}$ toward the direction of increasing $λ$ . Therefore, we can improve the proportion of regularization constraint to prevent overfitting. In addition, the overall trend of the two graphs is similar. Based on the above analysis, we justify the effectiveness of the classification fitness function (18).

Figure 3.

(a) Classification fitness surface and (b) average classification accuracy surface.

Analysis of optimization methods

In order to explore the performance difference between PSO and other parameter selection algorithms, we replace the PSO in the framework of PSO-SSELM with other heuristic evolutionary algorithms such as genetic algorithm (GA),²⁶ DE algorithm,²⁷ and the brute-force grid search algorithm (GS).²⁸ For convenience, we denote them as GA-SSELM, DE-SSELM, and GS-SSELM in the following. We run these algorithms in the same machine and software and time the training phase of each algorithm. Figure 4 gives the performance comparison, while Table 5 gives the average training time consumption.

Figure 4.

Comparison of (a) average error in coordinate prediction and (b) macro accuracy in regions recognition.

Table 5.

Comparison of average training time with different parameters selection method.

	GS-SSELM	PSO-SSELM	DE-SSELM	GA-SSELM
Training time (s)	1133.20	512.10	539.79	661.23

As observed from Figure 4 and Table 5, PSO-SSELM, DE-SSELM, GA-SSELM, and GS-SSELM reach the same performance level, since all these parameter selection algorithms are in the same optimization strategies, which are the fitness functions (17) and (18). Although we declare the same level here, it does not mean strictly equal, there are some differences among them. But the differences are usually within 1 m or 1%, which is small enough to ignore in the outdoor localization.

For training time consumption, the brute-force grid search takes much longer training time compared to other three heuristic algorithms. Among the heuristic evolutionary algorithms, GA-SSELM takes longer time than PSO-SSELM and DE-SSELM for training, while the PSO and DE are in the same level. This is because PSO and DE usually converge earlier than GA. Again we declare same level here, but this does not mean strictly equal, considering both PSO and DE are heuristic searching algorithms with randomness, we cannot guarantee the convergence rate is not affected by this stochastic factor and other experiment factors, therefore the training time difference within 2% can be ignored. Based on the above analysis, we can conclude that the chosen PSO and DE are the reasonable parameter selection tools.

Conclusion

The outdoor localization based on RSS with the cellular network usually need to collect labeled training data. By implementing the SS-ELM, we can reduce the demand of labeled training data without lowering the precision level. But the calibration of hyper parameters in SS-ELM is another labor-consuming job. To address this problem, in this article, we introduce a PSO and SS-ELM hybrid algorithm to optimize the hyper parameters automatically, so that SS-ELM can achieve optimal performance under difference conditions.

The experiments demonstrate that by importing a large amount of unlabeled training data in training phase, the PSO-SSELM gains comprehensive information for optimal hyper parameters selection, which improves the stability and provides better accuracy for SS-ELM.

We replace PSO with other parameter selection algorithms to explore the performance differences between PSO and others. Through the experiment, we find out that the performances of all the parameter selection algorithms are in the same level, the PSO even performs better in some cases. Moreover, PSO and DE take shortest training time among all the compared algorithms.

The necessary iteration in the training phase of PSO-SSELM takes a longer training time SS-ELM. However, considering the overall localization performance and the labor-saving calibration, we think that the idea in this article is worthy of referencing for outdoor localization.

The research of PSO-SSELM remains in the progress of overcoming its shortcomings. First, we use the artificial data in the validation, and we plan to deploy PSO-SSELM in the real environment in the future research. Second, although PSO-SSELM outperforms the other methods for outdoor location estimation, the accuracy is still in need to be improved. We would take the parameters of Laplacian matrix into the optimization phase.

Footnotes

Acknowledgements

The authors are grateful for the constructive advice on the revision of the manuscript from the anonymous reviewers.

Academic Editor: Stefano Savazzi

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work described in this paper was funded by the Engineering and Technology Research Center of Guangdong Province for Logistics Supply Chain and Internet of Things (Project No. GDDST[2016]176); the 3th strategic rising industry program of Guangdong Province (Project no. 2012556003); International Cooperation Special Program for platform (Project no. 2012J510018); the key lab of cloud computing and big data in Guangzhou (Project no. SITGZ[2013]268-6); Engineering & Technology Research Center of Guangdong Province for Big Data Intelligent Processing (Project no. GDDST[2013]1513-1-11); IoT home wireless router system and RFID (Project no. GDEID2012IS054); and the Promotion of the industrialization of Family Information Platform (Project no. 2013B090200055).

References

. A survey of fingerprint-based outdoor localization. IEEE Commun Surv Tut 2016; 18(1): 491–506.

Liu

Luo

Zou

. A low-cost and accurate indoor localization algorithm using label propagation based semi-supervised learning. In: Proceedings of the 5th international conference on mobile ad-hoc and sensor networks (MSN’09), Fujian, China, 14–16 December 2009, pp.108–111. New York: IEEE.

Sadiq

Valaee

. Automatic device-transparent RSS-based indoor localization. In: Proceedings of the global communications conference (GLOBECOM), San Diego, CA, 6–10 December 2015, pp.1–6. New York: IEEE.

Lin

Zhao

Luo

. A wireless localization algorithm based on spectral decomposition of the graph Laplacian. Acta Automat Sin 2011; 37(3): 316–321.

Tikhonov

. Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 2014; 4: 1035–1038.

Belkin

Niyogi

Sindhwani

. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 2006; 7(Nov): 2399–2434.

Belkin

Matveeva

Niyogi

. Regularization and semi-supervised learning on large graphs. In: Shawe-Taylor

Singer

(eds) International conference on computational learning theory. Berlin: Springer, 2004, pp.624–638.

Belkin

Niyogi

Sindhwani

. On manifold regularization. Society for artificial intelligence and statistics, pp.17–24, http://www.gatsby.ucl.ac.uk/aistats/

Melacci

Belkin

. Laplacian support vector machines trained in the primal. J Mach Learn Res 2011; 12: 1149–1184.

10.

Huang

Song

Gupta

. Semi-supervised and unsupervised extreme learning machines. IEEE T Cybernetics 2014; 44(12): 2405–2417.

11.

Liu

Chen

Liu

. SELM: semi-supervised elm with application in sparse calibrated location estimation. Neurocomputing 2011; 74(16): 2566–2572.

12.

Huang

Zhu

Siew

. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE international joint conference on neural networks, Budapest, 25–29 July 2004, volume 2, pp.985–990. New York: IEEE.

13.

Huang

Zhu

Siew

. Extreme learning machine: theory and applications. Neurocomputing 2006; 70(1): 489–501.

14.

Iosifidis

Tefas

Pitas

. Regularized extreme learning machine for multi-view semi-supervised action recognition. Neurocomputing 2014; 145: 250–262.

15.

Shu

. Evolutionary extreme learning machine-based on particle swarm optimization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, http://dx.doi.org/10.1007/1175996695.

16.

Zhu

Qin

Suganthan

. Evolutionary extreme learning machine. Pattern Recognit 2005; 38(10): 1759–1763.

17.

Cao

Lin

Huang

. Self-adaptive evolutionary extreme learning machine. Neural Process Lett 2012; 36: 285–305.

18.

Han

Yao

Ling

. An improved evolutionary extreme learning machine based on particle swarm optimization. Neurocomputing 2013; 116: 87–93.

19.

Parrott

. Locating and tracking multiple dynamic optima by a particle swarm model using speciation. IEEE T Evolut Comput 2006; 10(4): 440–458.

20.

Langdon

Poli

. Evolving problems to learn about particle swarm optimizers and other search algorithms. IEEE T Evolut Comput 2007; 11(5): 561–578.

21.

Kennedy

Eberhart

. Particle swarm optimization. In: Proceedings of the international conference on neural networks, Perth, WA, Australia, 27 November–1 December 1995, pp.1942–1948. New York: IEEE.

22.

Eberhart

Kennedy

. A new optimizer using particle swarm theory. In: Proceedings of the international symposium on MICRO machine and human science, Nagoya, Japan, 4–6 October 1995, pp.39–43. New York: IEEE.

23.

Shi

Eberhart

. A modified particle swarm optimizer. In: Proceedings of the 1998 IEEE international conference on evolutionary computation. IEEE world congress on computational intelligence, Anchorage, AK, 4–9 May 1998, pp.69–73. New York: IEEE.

24.

Mellios

Hilton

Nix

. Ray-tracing urban picocell 3D propagation statistics for LTE heterogeneous networks. In: Proceedings of the 2013 7th European conference on antennas and propagation (EuCAP), Gothenburg, 8–12 April 2013, pp.4015–4019. New York: IEEE.

25.

Pedregosa

Varoquaux

Gramfort

. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.

26.

Holland

. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. Cambridge, MA: MIT Press, 1992.

27.

Price

. Differential evolution: a fast and simple numerical optimizer. In: Proceedings of the 1996 biennial conference of the North American fuzzy information processing society (NAFIPS), Berkeley, CA, 19–22 June 1996, pp.524–527. New York: IEEE.

28.

Lerman

. Fitting segmented regression models by grid search. Appl Stat 1980; 29: 77–84.

Hybrid particle swarm optimization and semi-supervised extreme learning machine for cellular network localization

Abstract

Keywords

Introduction

Preliminaries

PSO

SS-ELM

Hybrid PSO and SS-ELM for localization

Experiment

Experimental setup

Performance measurement

Performance comparison

Analysis of fitness functions

Analysis of optimization methods

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

References