Sage Journals: Discover world-class research

Abstract

Time series have broad usage in the wireless Internet of Things. This article proposes a nonlinear time series prediction algorithm based on the Small-World Scale-Free Network after the AIC-Optimized Subtractive Clustering Algorithm (AIC-DSCA-SSNET, AD-SSNET) to predict the nonlinear and unstable time series, which improves the prediction accuracy. The AD-SSNET is introduced as a reservoir based on the echo state network to improve the predictive capability of nonlinear time series, and combined with artificial intelligence method to construct the prediction model training samples. First, the optimal clustering scheme of randomly distributed neurons in the network is adaptively obtained by the AIC-DSCA, then the AD-SSNET is constructed according to the intra-cluster priority connection algorithm. Finally, the reservoir synaptic matrix is calculated according to the synaptic information. Experimental results show that the proposed nonlinear time series prediction algorithm extends the feasible range of spectral radii of the reservoir, improves the prediction accuracy of nonlinear time series, and has great significance to time series analysis in the era of wireless Internet of Things.

Keywords

Time series nonlinear small world scale-free reservoir prediction accuracy

Introduction

Numerous data spark in many known or unknown fields with the advent of 5G and the Internet of Things era; these data submerge the world that human beings depend. The analysis of these data has brought great challenges to scholars and has become a hot spot of artificial intelligence research.^1–3 Time series data refer to a consistent stream of datasets over the course of a period of time. The time series analysis by machine learning method mainly includes clustering, classification, anomaly detection, and prediction, which will bring significant benefits to all kinds of people in various vertical fields.^4,5 Therefore, this article researches the time series prediction method to provide more possibilities for analyzing a large number of time series in the wireless Internet of Things.

Time series prediction is widely used in the fields of industry,⁶ economy,⁷ environment,⁸ and so on. However, most of the time series show strong nonlinear characteristics in the real world. Therefore, it is necessary to construct a prediction model by using a nonlinear prediction method to improve the model’s fitting ability to the nonlinear time series data. At present, nonlinear prediction methods are mainly divided into two categories. One is the regression method,^9–13 which is suitable for time series prediction with slower change, and the nonlinear characteristics of the time series are easily eliminated by its linearization process. The other is to predict by the neural network in machine learning,^14–16 especially represented by echo state network (ESN).^17,18 It has a large, sparsely connected reservoir, and its learning method is efficient.^19,20 The approximation capability of nonlinear time series is mainly ensured through its reservoir. However, the random connection of internal neurons in the reservoir of traditional ESN leads to the randomness of the network structure, making the model training purposeless and poorly adaptable, and unable to meet the requirements of effective prediction to nonlinear time series.^21,22 Therefore, it is necessary to analyze and improve the network structure of the reservoir.

To improve the performance of the reservoir, some scholars proposed the small-world networks to replace the random network.^23–26 The small-world network is a kind of network structure that can reflect the real world. It has both a short average characteristic path length (ACPL) and a high average clustering coefficient (ACC) and has the advantage of random network and regular network. The literature^27–30 proposed a small-world echo state network (SWESN). It used a small-world network to improve the structure of the reservoir and improve prediction accuracy and adaptability.^31–33 Kawai et al.³⁴ studied the performance of the reservoir under three different topologies: regular network, small-world network, and random network, which proved the superiority of the small-world network. However, the real-world social network nodes are randomly distributed at birth and gradually form a life circle. Therefore, some scholars applied this idea to improve ESN through clustering methods. The internal neurons of the reservoir are clustered to construct the synaptic matrix according to the clustering information. Furthermore, the predictive model is constructed to improve the nonlinear prediction capability.³⁵ Deng and Zhang³⁶ proposed a scale-free highly clustered echo state network (SHESN), whose reservoir is uniformly clustered with both small-world and scale-free characteristics. It was successfully applied to Mackey–Glass (MG) and laser time series prediction and obtained higher prediction accuracy than ESN. The Xue te al.³⁷ applied SHESN in the financial time series prediction and achieved better prediction performance. It proved that the high clustered scale-free network has strong computing power. Lei et al.³⁸ proposed a complex ESN based on prior clusters. The results of power spectrum analysis are used as prior knowledge to construct subclusters for the prediction problem of traffic flow time series with multi-period characteristics. Najibi and Rostami³⁹ used the k-means algorithm to optimize the clustering effect of the reservoir in SHESN. However, the number of cluster heads in the reservoir must be pre-set according to prior knowledge.

This article is absorbed in the problem that the above method relies too much on prior knowledge when clustering. In order to improve the clustering performance, the nonlinear time series prediction algorithm based on the Small-World Scale-Free Network after the AIC-Optimized Subtractive Clustering Algorithm (AIC-DSCA-SSNET, AD-SSNET) is proposed. It can adaptively obtain optimal clustering scheme and construct a complex clustering network with small-world scale-free characteristics. The clustering network is used as a reservoir to improve the prediction accuracy of nonlinear time series. Moreover, it has great significance to time series analysis in the era of wireless Internet of Things.

Nonlinear time series prediction algorithm based on AD-SSNET

AD-SSESN architecture

On the basis of ESN architecture, the AD-SSESN model is constructed by using the AD-SSNET as the reservoir to improve the nonlinear approximation capability. The AD-SSESN architecture is shown in Figure 1.

Figure 1.

AD-SSESN architecture.

The AD-SSESN architecture has three layers, and the reservoir is AD-SSNET. Its state updated equation and output equation are as follows:

x (t) = f (W_{in} u (t) + W_{res} x (t - 1))

(1)

y (t) = x^{T} (t) W_{out}

(2)

where $u (t)$ denotes the input vector, $x (t)$ denotes the state vector of reservoir, $y (t)$ denotes the output vector, $f (\cdot) = \tanh (\cdot)$ denotes the activation function, $W_{in}$ denotes the input synaptic matrix, $W_{res}$ denotes the reservoir synaptic matrix, and $W_{out}$ denotes the output synaptic matrix. Here, $W_{in}$ is randomly generated before network training, and $W_{res}$ is generated by constructing the AD-SSNET, neither of them changes during the training process. $W_{out}$ is calculated by the least-square method.⁴⁰

AD-SSNET generation method

How to build a reservoir synaptic matrix $W_{res}$ that has better performance is a key point in generating the AD-SSNET model. This article proposes a generation algorithm of reservoir synaptic matrix, and its flow diagram is shown in Figure 2.

Figure 2.

Flow diagram of AD-SSNET model generation.

First, the AIC-DSCA optimal clustering algorithm is studied to cluster randomly distributed neurons. Then, the small-world scale-free network is built by the clustering result. Finally, the synaptic information between two neurons is extracted, and the reservoir synaptic matrix $W_{res}$ is calculated according to it.

Determination of optimal clustering scheme by AIC-DSCA

The AIC-DSCA is used to obtain the optimal clustering scheme adaptively for randomly distributed neurons. First, the dynamic subtractive clustering algorithm (DSCA) is studied. The maximum intra-cluster distance variance as the evaluation index is proposed to find the optimal clustering scheme under different cluster head numbers. Second, the Akaike information criterion (AIC) of DSCA is introduced to determine the optimal clustering scheme by calculating the optimal cluster heads. The specific steps are as follows:

1. DSCA

The parameter combinations $Γ = [r_{a}, θ]$ of traditional SCA is set to rely heavily on expert experience, and different parameter combinations $Γ = [r_{a}, θ]$ will lead to different clustering effects. Therefore, we proposed the DSCA. The DSCA is as follows.

First, the cluster head numbers $k = 1, 2, 3, 4, 5, . . .$ are set and the SCA to obtain candidate clustering schemes is performed. Let the coordinates of $N$ randomly distributed neurons be $M_{1} = (x_{1}, y_{1}), M_{2} = (x_{2}, y_{2}), . . ., M_{N} = (x_{N}, y_{N})$ , and the density value of each neuron in the network is calculated. The greater the density value of one neuron, the more the number of neurons included by it, and the higher the probability of becoming a cluster head. When calculating the first cluster head position, the density function $f_{i}^{1}$ of the neuron $i$ is expressed as follows:

f_{i}^{1} = \sum_{j = 1}^{N} \exp (- \frac{{‖ M_{i} - M_{j} ‖}^{2}}{{(r_{a} / 2)}^{2}})

(3)

where $r_{a}$ is a constant, which defines a neighborhood where the density value is significantly reduced and other neurons outside the neighborhood have little effect on the density value of neuron $i$ . The maximum density value $f_{opt}^{1} = max {f_{i}^{1}}$ is selected and the corresponding neuron $M_{opt}^{1} = (x_{i}, y_{i})$ is the first cluster head. In order to exclude the influence of the zth cluster heads on the density of other neurons, the neuron density function needs to be modified in the selection of the other (z+ 1)th cluster heads after the first cluster head is determined. The density function $f_{i}^{z + 1}$ of the modified neuron $i$ is expressed as follows:

f_{i}^{z + 1} = f_{i}^{z} - f_{opt}^{z} \times \sum_{j = 1}^{N} \exp (- \frac{{‖ M_{i} - M_{opt}^{k} ‖}^{2}}{{(r_{b} / 2)}^{2}})

(4)

where $r_{b}$ is a constant and $r_{b} = 1.5 r_{a}$ . The updated maximum density value $f_{opt}^{z + 1} = max {f_{i}^{z + 1}}$ is selected and the corresponding neuron $M_{opt}^{z + 1} = (x_{i}, y_{i})$ is the (z+ 1)th cluster head. When $f_{opt}^{k} / f_{opt}^{1} \leq θ, θ \in (0, 1)$ , the one-time SCA ends and $g$ cluster heads are formed in the network.

If $g$ and $k$ are consistent, the parameter configuration $Γ = [r_{a}, θ]$ is used as a candidate clustering scheme for $k$ cluster heads and is traversed to repeat the above-mentioned SCA until all different candidate clustering schemes of $k$ cluster heads are selected.

Finally, it is necessary to choose the optimal clustering scheme with an evaluation index if there are more than one candidate scheme under the same cluster head numbers. For each candidate scheme, each neuron is assigned to the nearest cluster head by the nearest distance principle. According to the distance from each neuron of the ith cluster to the cluster head under $k$ cluster heads, the distance variance $D_{j} (j = 1, 2, . . ., k)$ of the ith cluster is calculated as:

D_{j} (j = 1, 2, . . ., k) = \sum_{i = 1}^{Q} \frac{{(\bar{d} - d_{i})}^{2}}{i}

(5)

where $d_{i}$ denotes the distance from the ith member in the jth cluster to the cluster head, $\bar{d}$ denotes the average distance from all members of the jth cluster to cluster head, and $Q$ denotes the number of all neurons in the jth cluster.

The maximum distance variance $max {D_{j} (j = 1, 2, . . ., k)}$ is calculated as the evaluation index. If there are multiple candidate clustering schemes, the optimal clustering scheme is selected under $k$ cluster heads according to the minimum evaluation index.

2. The AIC criterion of DSCA

It is necessary to select the optimal cluster heads after obtaining the optimal clustering scheme under different cluster heads. The AIC is used as an evaluation index proposed by H. Akaike in the study of time series ordering problems. Its distinctive feature is the “principle of parsimony,” and its definition is as follows:

AIC = - 2 \ln (l) + 2 r

(6)

where $l$ is the maximum likelihood estimation function of the model, and $r$ is the number of independent parameters of the model. In general, the $AIC$ value decreases when $r$ increases, and the log-likelihood function $\ln (l)$ increases faster. In addition, the $AIC$ value increases, and the model to be over-fitting when $r$ is too large, and the growth rate of $\ln (l)$ is slow. Therefore, the model is best when the $AIC$ value is the smallest.

The AIC criterion of DSCA is as follows. Setting the number of neurons to $N$ and the number of cluster heads to $k$ , then the distribution of cluster heads is $M = [M_{opt}^{1}, M_{opt}^{2}, . . ., M_{opt}^{k}]$ , the number of the neurons in each cluster is $Q_{i} (i = 1, 2, . . . k)$ , the maximum distance variance is $v_{max} = max {D_{i} | | i = 1, 2, . . . k}$ , and the minimum distance variance is $v_{min} = min {D_{i} | | i = 1, 2, . . . k}$ in all clusters. Then, the distribution density function of intra-cluster distance variance is as follows:

f (v_{i}) = \frac{\frac{Q_{i}}{N}}{\frac{{v_{m a}}_{x} - v_{min}}{k}} = \frac{k}{N} \frac{Q_{i}}{v_{max} - v_{min}}, i = 1, 2, . . ., k

(7)

Therefore, according to the log maximum likelihood estimation function, the intra-cluster distance variance likelihood estimation function $l$ is as follows:

l = \sum_{i = 1}^{k} \frac{k}{N} \frac{Q_{i}}{v_{max} - v_{min}}

(8)

AIC = - \ln (\sum_{i = 1}^{k} \frac{k}{N} \frac{Q_{i}}{v_{max} - v_{min}}) + 2 k

(9)

According to equation (9), the cluster head numbers with the smallest $AIC$ are the optimal cluster heads, and the optimal clustering scheme under the optimal cluster number is used as the final optimal clustering scheme. The flow diagram of the AIC-DSCA is shown in Figure 3 and the specific steps are as follows:

Step 1: Set the parameter configuration $Γ = [r_{a}, θ]$ and $k = 1$ .

Step 2: Perform DSCA on $N$ neurons to calculate the cluster head numbers.

Step 3: If cluster head numbers is $k$ , proceed to Step 4. Otherwise, modify the parameter configuration $Γ = [r_{a}, θ]$ and proceed to Step 2.

Step 4: After obtaining all candidate clustering schemes under $k$ cluster heads, the optimal clustering scheme could be selected according to the intra-cluster maximum distance variance.

Step 5: Calculate the $AIC$ value of the optimal clustering scheme under $k$ cluster heads.

Step 6: If the $AIC$ value is the minimum value, the optimal cluster head number is current $k$ , and the final optimal scheme is determined. Otherwise $k = k + 1$ and proceed to Step 2.

Algorithm 1: AIC-DSCA
Input: the parameter configuration $Γ = [r_{a}, θ]$ and $k = 1$ Output: the optimal clustering scheme and the optimal cluster head number $k$ 1: for $k = 1; k \leq N; k + +$ do2: for $i = 1; i \leq 100; i + +$ do3: for $j = 1; j \leq 100; j + +$ do Get the cluster head number $g$ by SCA; $g \leftarrow Γ = [r_{a} (i), θ (j)]$ 4: if $g = = k$ then determine the optimal scheme under the optimal cluster head number $k$ Calculate $AIC$ value5: end if6: end for7: end for8: if $AIC = \min (AIC)$ then determine the optimal scheme and the optimal cluster head number $k$ Goto Wait for FINISH9: end if10: end for

Algorithm 1: AIC-DSCA

Input: the parameter configuration

Γ = [r_{a}, θ]

and

k = 1

Output: the optimal clustering scheme and the optimal cluster head number

k

1: for

k = 1; k \leq N; k + +

do2: for

i = 1; i \leq 100; i + +

do3: for

j = 1; j \leq 100; j + +

do Get the cluster head number

g

by SCA;

g \leftarrow Γ = [r_{a} (i), θ (j)]

4: if

g = = k

then determine the optimal scheme under the optimal cluster head number

k

Calculate

AIC

value5: end if6: end for7: end for8: if

AIC = \min (AIC)

then determine the optimal scheme and the optimal cluster head number

k

Goto Wait for FINISH9: end if10: end for

Figure 3.

Flow diagram of the AIC-DSCA.

Construction of SSNET

According to the clustering result of neurons, the small-world scale-free network is constructed by intra-cluster connections and inter-cluster connections. Inter-cluster connections will be fully connected for all cluster heads, and the way of intra-cluster connections is as follows.

First, the neurons are defined as two types. One is the cluster head neurons as backbone neurons; the other is the neurons close to their backbone neuron as local neurons. The candidate neighbors of a new local neuron are the set of neurons to which this new local neuron is allowed to be connected. Assuming that there is a circle whose center is the location of backbone neurons in the current cluster and radius is the Euclidean distance from the new local neuron to the location of its backbone neuron. Other existing local neurons in the circle are defined as candidate neighbors of newly added local neurons. Of course, the backbone neuron of the current cluster is always one of the candidate neighbors.

Then, local neurons within the cluster are chosen according to the distance from the backbone neurons and the connections are established with the existing candidate neighbor neurons.

$N_{max}$ denotes the maximum number of connections of a new local neuron and controls the density of inter-cluster connections. $N_{c}$ denotes the number of candidate neighbors of a new local neuron. Therefore, the connection probability of the new local neuron based on the intra-cluster priority connection algorithm is given by the following rules:

if $N_{max} \geq N_{c}$ , the new local neuron is fully connected to all the candidate neighbor neurons;

if $N_{max} < N_{c}$ , the new local neuron is connected to all the candidate neighbor neurons with the following probability:

\frac{s_{i}}{\sum_{i \in C} s_{i}}

(10)

The number of connections of a neuron is called degree. Here, $s_{i}$ is the degree of current neuron $i$ , and $C$ is the candidate neighbor neurons of the new local neuron. Neurons prefer to connect to neurons that already have more connections according to the scale-free criterion. Therefore, the probability that a new local neuron is connected to an existing neuron is proportional to the degree of the existing neurons.

The flow diagram of the SSNET construction is shown in Figure 4 and its specific steps are as follows:

Step 1: Choose a cluster. All neurons are divided into backbone neurons and local neurons.

Step 2: Choose a new local neuron according to the distance from the backbone neurons, and calculate the number of candidate neighbor neurons.

Step 3: Set the maximum number of connections, and let the new local neuron connect to its candidate neighbor neurons according to the intra-cluster priority connection algorithm. If all the new local neurons in the same cluster have been added, proceed to Step 4, otherwise proceed to Step2.

Step 4: If all the clusters have completed the intra-cluster connections, proceed to Step 5, otherwise, proceed to Step 1.

Step 5: Let all cluster heads make a full connection, and build the small-world scale-free network.

Figure 4.

Flow diagram of network construction.

Algorithm 2: AD-SSNET Construction
Input: the optimal clustering scheme ( $cidx, center$ ) and the optimal cluster head number $k$ Output: the reservoir synaptic matrix $W_{res}$ 1: for $i = 1; i \leq N; i + +$ do2: for $j = 1; j \leq N; j + +$ do $DisMatrix (i, j) \leftarrow Position (i, j)$ 3: end for4: end for5: for $i = 1; i \leq k; i + +$ do $ClassNum (i); BackboneNode (i);$ $BoneDis (i); DisSort (i) \leftarrow cidx (i), center (i)$ $M (i) = BackboneNode (i);$ $N_{c} = 1$ 6: for $j = 1; j \leq length (DisSort (i)); j + +$ do7: for $t = 1; t \leq length (M (i)); t + +$ do8: if $DisMatrix (M (t, i), ClassNum (DisSort (j, i), i))$ $\leq BoneDis (DisSort (j, i), i)$ then $N_{c} = N_{c} + 1$ $NeighborNum \leftarrow t, j$ 9: if $N_{max} \geq N_{c}$ then $W_{res} (ClassNum (DisSort (j, i), i), NeighborNum (t)) = 1$ $W_{res} (NeighborNum (t), ClassNum (DisSort (j, i), i)) = 1$ 10: else $W_{res} (ClassNum (DisSort (j, i), i), NeighborNum (t)) = d_{i} / \sum_{i \in C} d_{i}$ $W_{res} (NeighborNum (t), ClassNum (DisSort (j, i), i)) = d_{i} / \sum_{i \in C} d_{i}$ 11: end if12: end if13: end for14: end for15: end for16: for $i = 1; i \leq k - 1; i + +$ 17: for $j = i + 1; i \leq k; j + +$ do $W_{res} (BackboneNode (i), BackboneNode (j)) = 1$ $W_{res} (BackboneNode (j), BackboneNode (i)) = 1$ 18: end for19: end for

Algorithm 2: AD-SSNET Construction

Input: the optimal clustering scheme (

cidx, center

) and the optimal cluster head number

k

Output: the reservoir synaptic matrix

W_{res}

1: for

i = 1; i \leq N; i + +

do2: for

j = 1; j \leq N; j + +

DisMatrix (i, j) \leftarrow Position (i, j)

3: end for4: end for5: for

i = 1; i \leq k; i + +

ClassNum (i); BackboneNode (i);

BoneDis (i); DisSort (i) \leftarrow cidx (i), center (i)

M (i) = BackboneNode (i);

N_{c} = 1

6: for

j = 1; j \leq length (DisSort (i)); j + +

do7: for

t = 1; t \leq length (M (i)); t + +

do8: if

DisMatrix (M (t, i), ClassNum (DisSort (j, i), i))

\leq BoneDis (DisSort (j, i), i)

then

N_{c} = N_{c} + 1

NeighborNum \leftarrow t, j

9: if

N_{max} \geq N_{c}

then

W_{res} (ClassNum (DisSort (j, i), i), NeighborNum (t)) = 1

W_{res} (NeighborNum (t), ClassNum (DisSort (j, i), i)) = 1

10: else

W_{res} (ClassNum (DisSort (j, i), i), NeighborNum (t)) = d_{i} / \sum_{i \in C} d_{i}

W_{res} (NeighborNum (t), ClassNum (DisSort (j, i), i)) = d_{i} / \sum_{i \in C} d_{i}

11: end if12: end if13: end for14: end for15: end for16: for

i = 1; i \leq k - 1; i + +

17: for

j = i + 1; i \leq k; j + +

W_{res} (BackboneNode (i), BackboneNode (j)) = 1

W_{res} (BackboneNode (j), BackboneNode (i)) = 1

18: end for19: end for

Therefore, the AD-SSESN prediction model construction and training process of the prediction algorithm in this article are as follows:

Step 1: Initialize the model parameters.

Step 2: Obtain the AD-SSESN prediction model by constructing the AD-SSNET as a reservoir.

Step 3: Obtain the internal state matrix $X$ and the corresponding expected output matrix $Y$ by calculating and collecting the internal state vectors $x (t)$ and output vectors $y (t)$ of the reservoir by using the training datasets.

Step 4: Calculate the output weight $W_{out}$ by the least-square method, and then the trained AD-SSESN prediction model is obtained to predict nonlinear time series.

Analysis of simulation

Analysis of cluster

In total, 1000 neurons were randomly distributed on the plane of $300 \times 300$ . The AIC-DSCA is used for clustering to obtain the optimal clustering scheme. First, the DSCA is used to select the optimal scheme under different cluster head numbers, and the results are shown in Table 1.

Table 1.

Clustering results under different cluster head numbers.

Cluster head numbers	Candidate scheme numbers	Indicator	Optimal scheme	$r_{a}$	$θ$
7	178	476.1114	161	0.46	0.12
8	213	454.2209	51	0.28	0.65
9	262	444.4690	123	0.35	0.31
10	198	411.4365	136	0.37	0.15
11	97	321.0013	77	0.33	0.27
12	140	265.0910	140	0.42	0.01

Table 1 lists the clustering information under the number of cluster heads from 7 to 12. It can be seen that the clustering scheme is diverse under the same cluster head number. Therefore, the optimal clustering scheme under each cluster head number is selected by an indicator (the smallest “intra-cluster maximum distance variance”).

After obtaining the optimal clustering scheme under each cluster head number, the final optimal cluster head number and its optimal clustering scheme could be selected by the AIC of the DSCA. The $AIC$ values of the above optimal clustering schemes are calculated and compared, and the results are shown in Table 2.

Table 2.

$AIC$ values under different cluster head numbers.

Cluster head numbers	$AIC$ values
7	10.52
8	8.24
9	6.16
10	5.56
11	7.13
12	9.55

The number of cluster heads with the minimum $AIC$ value is chosen as the optimal one, so the number of the optimal cluster heads selected by the AIC-DSCA is 10; its clustering result is shown in Figure 5.

Figure 5.

Optimal clustering result based on AIC-DSCA.

Analysis of network characteristics

After the optimal clustering scheme is obtained, the AD network is constructed by the method explained in section “Construction of SSNET.” The small-world characteristics and the scale-free characteristics of the AD network are analyzed.

Analysis of small-world characteristics

The small-world characteristics in the complex network can be characterized by its ACPL and ACC. When the ACPL is small and the ACC is large, the small-world characteristics of network are better.⁴¹ The ACPL and ACC of the parent network and each subnet based on the AD network are shown in Table 3. The ACPL and ACC of clustering schemes under different cluster head numbers are shown in Table 4. The ACPL and ACC of random-network, small-world network, and high-clustering scale-free network under the same scale are shown in Table 5.

Table 3.

Analysis of small-world characteristics for parent network and its subnet.

Type	Size	ACPL	ACC
Parent	1000	3.305	0.446
Subnet1	121	2.159	0.418
Subnet2	114	2.137	0.470
Subnet3	112	2.246	0.356
Subnet4	108	2.141	0.447
Subnet5	97	2.063	0.454
Subnet6	83	2.065	0.475
Subnet7	101	2.143	0.488
Subnet8	85	1.967	0.547
Subnet9	87	2.027	0.526
Subnet10	92	2.127	0.419

ACPL: average characteristic path length; ACC: average clustering coefficient.

Table 4.

Analysis of small-world characteristics under different cluster head numbers.

Cluster head numbers	ACPL	ACC
8	4.5096	0.2965
9	3.8051	0.3137
10	3.3054	0.4458
11	3.5953	0.3903
12	3.6691	0.3545

ACPL: average characteristic path length; ACC: average clustering coefficient.

Table 5.

Analysis of small-world characteristics under different networks.

Type	ACPL	ACC
AD network	3.3054	0.4458
Random network	3.2668	0.0112
Small-world network	5.8514	0.4065
Highly clustered	3.7692	0.2303

ACPL: average characteristic path length; ACC: average clustering coefficient.

It can be seen from Table 3 that the ACPL of the parent network and its subnets are small, and the ACC is large, so this indicates all have small-world characteristics. The number of members and the small-world characteristics is similar in each subnet, so this indicates that the structures of the AD network are hierarchical and uniform-clustering in terms of small-world characteristics. It can be seen from Table 4 that the ACPL reaches a minimum when the cluster head numbers are 10, and the ACC reaches a maximum when the number of the cluster heads is 10. Therefore, when the number of the cluster heads is 10, the small-world characteristic of the AD network is the best. It can be seen from Table 5 that the ACPL of the AD network is smaller than the small-world network and the highly clustered scale-free network, and the ACC is larger than the random network, the small-world network, and the highly clustered scale-free network. Consequently, the small-world characteristic of the AD network is more significant.

Analysis of scale-free characteristics

The scale-free characteristics in complex networks can be characterized by whether the degree of neurons satisfies the power-law distribution.⁴² In the AD network, the degree of each neuron is calculated, and the number of neurons with different degrees is got accounted for; its distribution is shown in Figure 6. It is processed logarithmically and fitted linearly, and then the correlation coefficient $R$ is calculated. It is considered that the power-law distribution is satisfied if $| R | \geq 0.95$ . The logarithmic relationship between the number of neurons and the degree of neurons and their fitted lines are shown in Figure 7. By calculation, the correlation coefficient $R$ is 0.986. Therefore, it indicates that the AD network has scale-free characteristics.

Figure 6.

Plot of the number of neurons versus degree.

Figure 7.

Log-log plot of the number of neurons versus degree.

In addition, the correlation coefficient $R$ of the parent network and its subnet are calculated separately. It can be seen from Table 6 that the correlation coefficients $R$ all are bigger than 0.95, so it indicates that they all have the scale-free characteristics. The number of each subnet members is similar; it further indicates that the structure of the AD network is hierarchical and uniform-clustering in terms of the scale-free characteristics.

Table 6.

Analysis of scale-free characteristics for parent network and its subnet.

Type	Size	R
Parent	1000	0.9860
Subnet1	121	0.9862
Subnet2	114	0.9874
Subnet3	112	0.9903
Subnet4	108	0.9921
Subnet5	97	0.9879
Subnet6	83	0.9861
Subnet7	101	0.9936
Subnet8	85	0.9875
Subnet9	87	0.9893
Subnet10	92	0.9848

Analysis of prediction

Dataset preparation and testing criterion

The MG time series and Lorenz time series were used as the dataset of nonlinear time series for prediction, which is generated as follows:

1. The chaotic dynamic formula of the MG system is as follows:

\frac{dx}{dt} = \frac{0.2 x (t - τ)}{1 + x^{e} (t - τ)} - 0.1 x (t)

(11)

where $τ$ denotes the time delay; the greater the time delay $τ$ , the stronger the nonlinearity of the system, which has chaotic characteristics when $τ \geq 17$ . The time delay is increased from 17 to 31 and the fourth-order Runge–Kutta algorithm is used to solve the MG system, then the 15 nonlinear time series datasets are constructed. The first 2300 points of each dataset are the training set and the last 200 points are the test set.

2. The chaotic dynamic formula of the Lorenz system is as follows:

{\begin{matrix} \frac{dx}{dt} = 10 (- x + y) \\ \frac{dy}{dt} = 28 x - y - xz \\ \frac{dz}{dt} = xy - \frac{8}{3} z \end{matrix}

(12)

The Lorenz system is solved by the fourth-order Runge–Kutta algorithm, and the time series of 2500 points are calculated, then the first 2300 points of dataset as the training set and the last 200 points as the test set are obtained.

The normalized root mean square error (NRMSE) is the performance indicator for all simulation predictions:

NRMSE = {(\frac{\sum_{l = 1}^{l} \sum_{m = n_{t} + 1}^{n_{t} + n_{c}} {(y_{d}^{l} (m) - y^{l} (m))}^{2}}{\sum_{l = 1}^{l} \sum_{m = n_{t} + 1}^{n_{t} + n_{c}} {(y_{d}^{l} (m))}^{2}})}^{1 / 2}

(13)

where $l$ denotes the number of the independent repeat tests, and this experiment used 100 independent repeat tests; $n_{t}$ and $n_{c}$ are the length of the training set and the test set, respectively; $y_{d}^{l} (m)$ denotes the true value of the mth iteration in the lth independent experiment; and $y^{l} (m)$ denotes the predicted value of the mth iteration prediction in the lth independent experiment.

Analysis of echo state property

Generally, the prediction model can be trained and predicted normally and stably only when the reservoir has “the echo state.” After normalizing the synaptic matrix, the spectral radius of synaptic matrix $W_{res}$ is the largest eigenvalue $λ_{max}$ , and it is used to measure the intensity of the “the echo state” of the reservoir. For such a randomly connected ESN, its spectral radius must satisfy $| λ_{max} | < 1$ , so that the reservoir could have “the echo state.” In order to analyze “the echo state” of AD-SSESN, the prediction effects under different spectral radius $λ_{max}$ are obtained by using the MG dataset ( $τ$ = 17) and the Lorenz dataset to test the AD-SSESN. Compared with ESN, the results are shown in Figures 8 and 9.

Figure 8.

NRMSE error versus spectral radius on MG dataset: (a) testing ESN and (b) testing AD-SSESN.

Figure 9.

NRMSE error versus spectral radius on Lorenz dataset: (a) testing ESN and (b) testing AD-SSESN.

It can be seen from Figures 8 and 9 that the spectral radius of the AD-SSESN is significantly larger than the ESN when NRMSE error increases significantly in the MG dataset and Lorenz data. Therefore, the “echo state” of the AD-SSESN is significantly enhanced, the stability of the prediction time series of the AD-SSESN is maintained over a wider range of spectral radius, and the predictive power is enhanced.

Approximating nonlinear capability

The MG dataset and Lorenz dataset are preprocessed through normalization and phase space reconstruction, and the prediction results are de-normalized. The prediction results of AD-SSESN for MG dataset and Lorenz dataset are shown in Figures 10 and 11.

Figure 10.

Prediction results of AD-SSESN for MG ( $τ = 17$ ) dataset.

Figure 11.

Prediction results of AD-SSESN for Lorenz dataset.

It can be seen from Figures 10 and 11 that the predicted curves of the AD-SSESN for MG datasets and Lorenz datasets are consistent with the actual curve trend, which indicates that the AD-SSESN has a high fitting ability.

In order to further analyze the error, the small-world scale-free prediction models (X-SSESN) are constructed according to different clustering schemes in Table 4, and 15 MG datasets and Lorenz datasets are predicted respectively. The results are shown in Figure 12 and Table 7.

Figure 12.

Prediction results of different cluster head numbers in the MG dataset.

Table 7.

Prediction results of different cluster head numbers in the Lorenz dataset.

Type	NRMSE
8-SSESN	0.02015
9-SSESN	0.01245
10-SSESN	0.00355
11-SSESN	0.00587
12-SSESN	0.00945

NRMSE: normalized root mean square error.

It can be seen from Figure 12 and Table 7 that the NRMSE errors of AD-SSESN for 15 MG datasets and Lorenz datasets are the minimum. Furthermore, in the prediction of MG datasets, the prediction accuracy can still be maintained with the increase in MG delay time.

Finally, four prediction models of AD-SSESN, ESN, SWESN, and SHESN with the same reservoir size, sparse connectivity, and appropriate spectral radius are constructed respectively according to the different reservoirs in Table 5. And 15 MG datasets and Lorenz datasets are predicted by the above models, and the results are shown in Figure 13 and Table 8.

Figure 13.

Prediction results of different networks in the MG dataset.

Table 8.

Prediction results of different networks in the Lorenz dataset.

Type	NRMSE
ESN	0.01565
SWESN	0.01156
SHESN	0.00615
AD-SSESN	0.00355

NRMSE: normalized root mean square error; ESN: echo state network; SWESN: small-world echo state network; SHESN: scale-free highly clustered echo state network.

It can be seen from Figure 13 and Table 8 that AD-SSESN has better prediction results for MG datasets with different time delays and Lorenz datasets compared with the other models. In the MG datasets, the AD-SSESN can maintain good prediction performance when $17 \leq τ \leq 24$ . When the time delay $τ > 24$ , the nonlinear enhancement of the MG dataset, the NRMSE error of ESN, SWESN, and SHESN increases rapidly, while the NRMSE error of AD-SSESN increases relatively slowly. In the Lorenz datasets, the AD-SSESN has the minimum NRMSE error and the best prediction performance. To sum up, the clustering performance of the reservoir is optimized, and the ACC of network is improved in the AD-SSESN. In addition, the “echo state” of the AD-SSESN is also significantly enhanced because its spectral radius is extended, so that the highly complex nonlinear dynamic systems can be fitted by the AD-SSESN.

Conclusion

This article proposes a nonlinear time series prediction algorithm based on the AD-SSNET, which improves the prediction accuracy of the prediction model to nonlinear time series data and brings more possibilities for the analysis of a large number of time series in the wireless Internet of Things. The number of the optimal cluster heads is obtained adaptively and its clustering schemes are optimized by the AIC-DSCA, then the AD-SSNET with small-world scale-free characteristics is constructed by the intra-cluster priority connection algorithm. This network is used as a reservoir to construct the AD-SSESN prediction model. Finally, the AD-SSESN prediction model is used to predict MG datasets and Lorenz datasets, respectively. Experimental results show that the NRMSE error of AD-SSESN is the minimum compared with other small-world scale-free network prediction models with different clustering schemes; the NRMSE error of AD-SSESN is also the minimum compared with the other three prediction models with different reservoir networks. The above results show that the highly complex nonlinear dynamic system is more accurately approximated. The prediction accuracy is steadily improved due to the optimized clustering performance of the reservoir, the ACC of the network is improved, and the “echo state” is enhanced significantly in the AD-SSESN.

Footnotes

Handling Editor: Peio Lopez Iturri

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Public Welfare Technology Research Projects of Zhejiang Province of China (Grant Nos LGG20F010009 and GF21F010018), the Zhejiang Shuren University Basic Scientific Research Special Funds (No. 2020XZ009), and the Project Intelligentization and Digitization for Airline Revolution (No. 2018R02008).

ORCID iDs

Banteng Liu

Meng Han

References

Witze

. 5G data networks threaten forecasts. Nature 2019; 569(7754): 17–18.

Cai

. Trading private range counting over big IoT data. In: 2019 IEEE 39th international conference on distributed computing systems (ICDCS), Dallas, TX, 7–10 July 2019. New York: IEEE.

Zheng

Cai

. Data linkage in smart internet of things systems: a consideration from a privacy perspective. IEEE Commun Mag 2018; 56(9): 55–61.

Luo

Wang

. A statistical time-frequency model for non-stationary time series analysis. IEEE T Signal Pr 2020; 68: 4757–4772.

Cai

Zheng

. A differential-private framework for urban traffic flows estimation via taxi companies. IEEE Trans Ind Inform 2019; 15: 6492–6499.

Zio

Zhang

, et al. A hybrid hourly natural gas demand forecasting method based on the integration of wavelet transform and enhanced deep-RNN model. Energy 2019; 178: 585–597.

Luo

Shao

. The structure and dynamics of granular complex networks deriving from financial time series. Int J Modern Phys C 2020; 31: 2050087.

Tian

Liu

Liang

, et al. Fine particulate air pollution and adult hospital admissions in 200 Chinese cities: a time-series analysis. Int J Epidemiol 2019; 48: 1142–1151.

Wang

Gao

. Bayesian network integrated regression model with super parent time series. J Comput Sci 2017(12): 116–129.

10.

Chaouch

. Volatility estimation in a nonlinear heteroscedastic functional regression model with martingale difference errors. J Multivariate Anal 2019; 170: 129–148.

11.

Vilela

Leme

Pinheiro

, et al. Forecasting financial series using clustering methods and support vector regression. Artif Intel Rev 2019; 52(2): 743–773.

12.

Han

. Pattern-adaptive time series prediction via online learning and paralleled processing using CUDA. In: 2019 IEEE 16th international conference on mobile ad hoc and sensor systems workshops (MASSW), Monterey, CA, 4–7 November 2019, pp.31–36. New York: IEEE.

13.

Chen

Cai

Cheng

, et al. Low-latency data aggregation scheduling for cognitive radio networks with non-predetermined structure. IEEE Trans Mobile Comput 2020; 2020: 9710.

14.

Liu

Hsaio

. Time series classification with multivariate convolutional neural network. IEEE T Ind Electr 2018; 66(6): 4788–4797.

15.

Yang

Chen

YPP

. Hybrid deep learning and empirical mode decomposition model for time series applications. Expert Syst Appl 2019; 120: 128–138.

16.

Wang

Yang

Qiao

. Research on pm(2.5) prediction based on echo state network. Control Eng 2019; 26(01): 3–7.

17.

Peng

Wang

Peng

. Research on time series prediction method based on echo state network. J Electr 2010; 38(01).

18.

Chouikhi

Ammar

Rokbani

, et al. Pso-based analysis of echo state network parameters for time series forecasting. Appl Soft Comput 2017; 55: 211–225.

19.

Miao

Cai

. Maximum reachability preserved graph cut. Theoret Comput Sci 2020; 840: 187–198.

20.

Luo

Yan

Zheng

, et al. Preserving adjustable path privacy for task acquisition in mobile crowdsensing systems. Inform Sci 2020; 527: 602–619.

21.

Jaeger

. Echo state network. Scholarpedia 2007; 2(9): 2330.

22.

Cai

Guan

, et al. Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE T Depend Secure Comput 2018; 15(4): 577–590.

23.

Zhang

Luo

, et al. Fast skyline community search in multi-valued networks. Big Data Mining Anal 2020; 3(3): 171–180.

24.

Cai

Zheng

. A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE T Netw Sci Eng 2020; 7(2): 766–775.

25.

Albinali

Han

Wang

, et al. The roles of social network mavens. In: 2016 12th international conference on mobile Ad-Hoc and sensor networks (MSN), Hefei, China, 16–18 December 2016, pp.1–8. New York: IEEE.

26.

Han

, et al. Maximising influence in sensed heterogeneous social network with privacy preservation. Int J Sensor Netw 2018; 28(2): 69–79.

27.

. Nonlinear time series prediction based on echo state network. PhD Thesis, Dalian University of technology, Dalian, China, 2013.

28.

Wang

. Topological structure of echo state network. PhD Thesis, Chongqing University, Chongqing, China, 2013.

29.

Chen

. Research on network security situation prediction technology based on small world echo state network. PhD Thesis, Lanzhou University, Lanzhou, China, 2014.

30.

Lun

Lin

Yao

. Time series prediction based on small world echo state network. J Automat 2015(09): 1669–1679.

31.

Han

Yan

Cai

, et al. Influence maximization by probing partial communities in dynamic online social networks. Trans Emerg Telecommun Technol 2017; 28(4): e3054.

32.

Zhang

, et al. Efficient interest-aware data dissemination in mobile opportunistic networks. J Softw Pract Exper. Epub ahead of print 1 October 2019. DOI: 10.1002/spe.2757.

33.

Han

, et al. Spreading social influence with both positive and negative opinions in online networks. Big Data Mining Anal 2019; 2(2): 100–117.

34.

Kawai

Park

Asada

. A small-world topology enhances the echo state property and signal propagation in reservoir computing. Neural Netw 2019; 112: 15–23.

35.

Liu

Han

Zhou

, et al. LSTM recurrent neural networks for influenza trends prediction. In: International symposium on bioinformatics research and applications, Beijing, China, 8–11 June 2018, pp.259–264. Cham: Springer.

36.

Deng

Zhang

. Collective behavior of a small-world recurrent neural system with scale-free distribution. IEEE T Neural Netw 2007; 18(5): 1364–1375.

37.

Xue

, et al. The application of SHESN on financial time series prediction. In: 2016 3rd international conference on information science and control engineering (ICISCE), Beijing, China, 8–10 July 2016, pp.692–696. New York: IEEE.

38.

Lei

Peng

Guo

, et al. Traffic prediction based on a priori cluster complex echo state network. J Instrum 2011; 10: 32–39.

39.

Najibi

Rostami

. SCESN, SPESN, SWESN: three recurrent neural echo state networks with clustered reservoirs for prediction of nonlinear and chaotic time series. Appl Intel 2015; 43(2): 460–472.

40.

Chen

Fang

Chen

. Circular curve fitting by least square method and iterative method. Survey Mapping Sci 2016; 41(001): 194–197202.

41.

Xueling

Suo Du

, et al. Anatomic insights into disrupted small-world networks in pediatric posttraumatic stress disorder. Radiology 2017; 282: 826–834.

42.

Chauhan

Friedrich

Rothenberger

. Greed is good for deterministic scale-free networks. Algorithmica 2020; 82: 3338–3389.

Nonlinear time series prediction algorithm based on AD-SSNET for artificial intelligence–powered Internet of Things

Abstract

Keywords

Introduction

Nonlinear time series prediction algorithm based on AD-SSNET

AD-SSESN architecture

AD-SSNET generation method

Determination of optimal clustering scheme by AIC-DSCA

Construction of SSNET

Analysis of simulation

Analysis of cluster

Analysis of network characteristics

Analysis of small-world characteristics

Analysis of scale-free characteristics

Analysis of prediction

Dataset preparation and testing criterion

Analysis of echo state property

Approximating nonlinear capability

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References