Dual possibilistic regression models of support vector machines and application in power load forecasting

Abstract

Power load forecasting is an important guarantee of safe, stable, and economic operation of power systems. It is appropriate to use interval data to represent fuzzy information in power load forecasting. The dual possibilistic regression models approximate the observed interval data from the outside and inside directions, respectively, which can estimate the inherent uncertainty existing in the given fuzzy phenomenon well. In this article, efficient dual possibilistic regression models of support vector machines based on solving a group of quadratic programming problems are proposed. And each quadratic programming problem containing fewer optimization variables makes the training speed of the proposed approach fast. Compared with other interval regression approaches based on support vector machines, such as quadratic loss support vector machine approach and two smaller quadratic programming problem support vector machine approach, the proposed approach is more efficient on several artificial datasets and power load dataset.

Keywords

Interval data dual possibilistic regression models fuzzy regression analysis support vector machine quadratic programming problem power load

Introduction

With the rapid development of wireless sensor networks,^1–6 Internet of things,^7–17 and machine learning,^18–21 ability of human to perceive nature and analyze data is becoming stronger. Power load forecasting is a typical application of the above technology. Power load forecasting is a technology that uses effective methods, historical data, natural and social conditions as the basis to determine the load value in the future.^22–24 However, power load often changes with time and shows a fluctuating state which makes it difficult to be represented by an exact value. Interval data are widely used to represent these uncertain and imprecise information. As a very important tool for dealing with interval data, fuzzy regression analysis based on interval regression analysis is widely used to predict the values of interval dependent quantitative variables as the values of functions of independent variables, which were employed to forecast power load.²⁵

Generally speaking, interval regression model can be roughly classified into two distinct categories. One is called the least square approach, which is established by the principle of distance minimization between the estimated intervals and the observed intervals.^26–29 Another is the possibilistic regression approach, which is established by aiming at minimizing the total vagueness of the estimated model under the condition that there are inclusion relationships between the observed intervals and the estimated intervals.³⁰ There are two kinds of possibilistic regression analysis models approximately according to the number of estimated output intervals. The first class is termed the dual possibilistic regression approach with two estimated interval outputs, one of which is the biggest scale of all possible estimated intervals which are included in the observation interval, and another of which is the smallest scale one which includes observation interval.³¹ The second class is the possibilistic regression model that has a single interval output which includes observation interval or crisp value.^32–36 The first possibilistic regression model was proposed by Tanaka et al.³⁷ and developed by his research team,^38–40 and coefficients in the interval regression model are assumed intervals. In their early studies, the linear programming (LP) approach was employed to calculate coefficients of the interval regression model. However, some coefficients become crisps because of the characteristic of the LP approach. To overcome this problem, quadratic programming (QP) was incorporated into possibilistic regression model and the noncrisp coefficients obtained by QP are more desirable than by LP. But there are also some limitations using possibilistic regression models based on LP and QP. First, established by minimizing the empirical risk, these models more likely overfit to training dataset and affect prediction performance. Second, they are difficult to deal with nonlinear interval regression because of choosing a nonlinear model from an infinite number of alternatives.

Based on statistical learning theory, the support vector machine (SVM) has been very successful in pattern recognition and function estimation problem by solving a large quadratic programming problem (QPP).⁴¹ So interval regression analysis approaches for fuzzy data based on SVM have been developed by some researchers. In Hong and Hwang,⁴² the quadratic loss SVM for crisp input and interval output regression analysis approach was proposed, which was a model-free approach because the algorithm uses kernel functions to handle nonlinear regression problems. However, this approach is very sensitive to outliers. In Hwang et al.,⁴³ the support vector interval regression machine (SVIRM) was proposed to deal with crisp input and crisp output regression analysis, and ε-insensitive loss function was used to reduce the impact of outliers. In Jeng et al.,⁴⁴ to overcome the problem of slow convergence of neural networks, the support vector interval regression networks (SVIRNs) were proposed which employed a classical support vector regression (SVR) to obtain initial structure of networks and high value and low value of estimated interval were identified by two networks. But the above approaches can only deal with crisp input data. So in Chuang,⁴⁵ the interval support vector interval regression networks (ISVIRNs) were proposed to deal with interval input and interval output, in which the distance between two interval data was calculated by the Hausdorff distance and train SVR to determine the initial structure of networks. In Hao,⁴⁶ the v-support vector interval regression network (v-SVIRN) was proposed to evaluate interval linear and nonlinear regression models for crisp input and crisp output data. The model will automatically adjust a parameter-insensitive region of any shape, which contains all crisp number outputs, thus replacing the appropriate value of the insensitive pipe width in traditional SVR. In Xu et al.,⁴⁷ an extended SVM algorithm for the estimation of asymmetrical intervals was proposed. An independent and exact description of one end could be given in this approach, without the influence of the other end. One important property of SVM is that it has large computational time complexity because of dealing with QPP in the training model phase. The time complexity for SVM is $O (n^{3})$ , where n is the total number of training data. To reduce the training time of SVM, in Jayadeva et al.,⁴⁸ a twin support vector machine (Twin-SVM) was proposed in which training model needed to solve two smaller QPPs instead of a single larger QPP. Each of the small QPPs has half of the training data, and therefore training time of Twin-SVM is approximately one-quarter of SVM. Similarly, in Peng et al.,⁴⁹ interval twin support vector regression (ITSVR) was proposed. It optimized two smaller-sized QPPs respectively too—one of which formulated high value of the interval output data and another calculated low value of them. In Hao⁵⁰ the upper model and the lower model were estimated by solving two smaller SVM-type QPPs, which have the same strategy as Twin-SVM and ITSVR and reduced computational time complexity of training regression model successfully.

In this article, we consider dual possibilistic regression models with multiple crisp inputs and dual estimated interval outputs. Motivated by Twin-SVM, ITSVR, and two smaller SVM-type QPPs, the strategy of solving a few smaller QPPs, rather than a single larger QPP, significantly reduces the training time of regression model based on SVM approach. The proposed approach in this article employed a group of QPPs to estimate two upper model boundaries and two lower model boundaries which can have fast training speed. Moreover, slack variables are incorporated into the approach to avoid the influence of outliers on regression models.

The remainder of this article is organized as follows. Brief reviews of Hong’s quadratic loss SVM approach and Hao’s two smaller QPP SVM approach are presented in section “Background.” We propose interval regression models of SVM based on solving a group of QPPs in section “Dual possibilistic interval regression of SVMs based on a group of smaller QPPs.” The experimental results are reported in section “Experiments,” and some concluding remarks are made in section “Conclusion.”

Background

In this section, we give a brief introduction of dual interval regression models based on Hong’s quadratic loss SVM approach and Hao’s two smaller QPP SVM approach.

Hong’s quadratic loss SVM approach

Suppose given a fuzzy training dataset ${(x_{1}, Y_{1}), (x_{2}, Y_{2}), \dots, (x_{n}, Y_{n})}$ , where $x_{i} = (1, x_{i 1}, \dots, x_{im})^{t}$ denotes the real crisp input, and $Y_{i}$ is the corresponding observed interval, which is expressed as $Y_{i} = (y_{i}, e_{i})$ , where $y_{i}$ is a center and $e_{i}$ is a radius. Similarly, interval coefficient $A_{i}$ is denoted as $A_{i} = (a_{i}, c_{i})$ . Then, an interval linear model is expressed as

\begin{matrix} Y (x_{i}) = (a_{0}, c_{o}) + (a_{1}, c_{1}) x_{i 1} + \dots + (a_{n}, c_{n}) x_{i n} \\ = (a_{0} + a_{1} x_{i 1} + \dots + a_{n} x_{i n}, c_{o} + c_{1} | x_{i 1} | + \dots + c_{n} | x_{i n} |) = (a^{t} x, c^{t} | x |) \end{matrix}

(1)

Upper model $Y^{*} (x_{i})$ and lower model $Y_{*} (x_{i})$ can be written respectively

Y^{*} (x_{i}) = (a^{t} x_{i}, c^{t} | x_{i} | + d^{t} | x_{i} |)

(2)

Y_{*} (x_{i}) = (a^{t} x_{i}, c^{t} | x_{i} |)

(3)

This regression model is constructed by solving the following QPP

\begin{matrix} min \frac{1}{2} ({‖ a ‖}^{2} + {‖ c ‖}^{2} + {‖ d ‖}^{2}) \\ + \frac{C}{2} (\sum_{i = 1}^{n} ξ_{i}^{2} + \sum_{i = 1}^{n} ξ_{2 i}^{2} + \sum_{i = 1}^{n} ξ_{2 i}^{* 2}) \end{matrix}

(4a)

subject to

d^{t} | x_{i} | \leq ξ_{1 i}

(4b)

y_{i} - a^{t} x_{i} \leq ξ_{2 i}

(4c)

a^{t} x_{i} - y_{i} \leq ξ_{2 i}^{*}

(4d)

y_{i} + e_{i} \leq a^{t} x_{i} + c^{t} | x_{i} | + d^{t} | x_{i} |

(4e)

a^{t} x_{i} - c^{t} | x_{i} | - d^{t} | x_{i} | \leq y_{i} - e_{i}

(4f)

a^{t} x_{i} + c^{t} | x_{i} | \leq y_{i} + e_{i}

(4g)

y_{i} - e \leq a^{t} x_{i} - c^{t} | x_{i} |

(4h)

where $‖ a ‖^{2} + ‖ c ‖^{2} + ‖ d ‖^{2}$ is a regularization term that can control the complexity of the upper model and lower model. C is a trade-off between the flatness of $Y (x)$ and the deviations are tolerated in quadratic loss SVM. Constraints (4e)–(4h) ensure $Y_{*} (x_{i}) \subseteq Y (x_{i}) \subseteq Y^{*} (x_{i})$ for $\forall i \in {1, \dots, n}$ in training dataset. But they make quadratic loss SVM algorithm sensitive to outliers that there are not any slack variables in constraints.

Hao’s two smaller QPP SVM approach

Hao’s interval regression of the upper model $Y^{*} (x_{i})$ and the lower model $Y_{*} (x_{i})$ are written as

Y^{*} (x_{i}) = (a^{t} x_{i}, h^{t} | x_{i} |)

(5)

Y_{*} (x_{i}) = (a^{t} x_{i}, h^{t} | x_{i} | - k^{t} | x_{i} |)

(6)

The reason that Hao refined the formula $Y^{*} (x_{i})$ and $Y_{*} (x_{i})$ is if the value C is smaller in Hong’s approach, the complexity of $Y^{*} (x_{i})$ and $Y_{*} (x_{i})$ is smaller, and the spreads of the upper model and lower model are reduced simultaneously. This can affect the regression effect. Then, the upper model of Hao is estimated by solving the following QPP

\begin{matrix} min \frac{1}{2} ‖ a ‖^{2} + \frac{C_{1}}{2} ‖ h ‖^{2} + C_{2} \sum_{i = 1}^{n} (ξ_{1 i} + ξ_{1 i}^{*} + ξ_{2 i} + ξ_{2 i}^{*}) \\ subject to {\begin{matrix} a^{t} x_{i} + h^{t} | x_{i} | \geq y_{i} + e_{i} - ξ_{1 i} \\ y_{i} - e_{i} \geq a^{t} x_{i} - h^{t} | x_{i} | - ξ_{1 i}^{*} \\ y_{i} + e_{i} - ε \geq a^{t} x_{i} - ξ_{2 i} \\ a^{t} x_{i} \geq y_{i} - e_{i} + ε - ξ_{2 i}^{*} \\ ξ_{1 i}, ξ_{1 i}^{*}, ξ_{2 i}, ξ_{2 i}^{*} \geq 0 i = 1, \dots, n \end{matrix} \end{matrix}

(7)

Minimizing $‖ h ‖^{2}$ reduces not only the complexity of the upper model but also the spread of it. The slack variables $ξ_{1 i}$ , $ξ_{1 i}^{*}$ , $ξ_{2 i}$ , and $ξ_{2 i}^{*}$ measure the degree of the inclusion condition being violated, and thus the approach can deal with noisy training dataset well. The center $a^{t} x_{i}$ is limited to be located between $y_{i} + e_{i} - ε$ and $y_{i} - e_{i} + ε$ , which makes the spread of the lower model greater than zero. The lower model of Hao’s approach is estimated by solving the following QPP

\begin{matrix} min \frac{C_{3}}{2} ‖ k ‖^{2} + C_{4} \sum_{i = 1}^{n} (ξ_{3 i} + ξ_{3 i}^{*}) \\ subject to {\begin{matrix} y_{i} + e_{i} \geq {\bar{Y}}_{i}^{*} - k^{t} | x_{i} | + ξ_{3 i} \\ {\underline{Y}}_{i}^{*} - k^{t} | x_{i} | \geq y_{i} - e_{i} + ξ_{1 i}^{*} \\ k^{t} | x_{i} | \leq R_{Y_{i}^{*}} \\ ξ_{3 i}, ξ_{3 i}^{*} \geq 0 i = 1, \dots, n \end{matrix} \end{matrix}

(8)

where ${\bar{Y}}_{i}^{*} = a^{t} x_{i} + h^{t} | x_{i} |$ , ${\underline{Y}}_{i}^{*} = a^{t} x_{i} - h^{t} | x_{i} |$ , and $R_{Y_{i}^{*}} = h^{t} | x_{i} |$ . Minimizing $‖ k ‖^{2}$ reduces the distance between the upper model and the lower model, and therefore increases the spread of the lower model.

Dual possibilistic interval regression of SVMs based on a group of smaller QPPs

Interval linear regression

As mentioned earlier, an interval regression approach employs the strategy of solving two smaller QPPs rather than a single larger QPP which can significantly reduce the training time of regression models. Obviously, the smaller the scale of QPPs is, the faster the training speed of the interval regression approach is. Thus, in this section, the efficient dual possibilistic interval regression models of SVMs which employ a group of QPPs are proposed, and each of SVMs has a smaller scale QPP than the above approaches. In the proposed approach, the upper model $Y^{*}$ and the lower model $Y_{*}$ can be written as follows:

Upper model

{\bar{Y}}_{i}^{*} = w_{1}^{t} x_{i}

(9)

{\underline{Y}}_{i}^{*} = w_{4}^{t} x_{i}

(10)

Lower model

{\bar{Y}}_{* i} = w_{2}^{t} x_{i}

(11)

{\underline{Y}}_{* i} = w_{3}^{t} x_{i}

(12)

where ${\bar{Y}}_{i}^{*}$ and ${\underline{Y}}_{i}^{*}$ are the high value and the low value of the estimated upper model, respectively, and ${\bar{Y}}_{* i}$ and ${\underline{Y}}_{* i}$ are those of the estimated lower model, respectively. Then, linear interval regression models are estimated by solving the following four QPPs

1 . min \frac{1}{2} ‖ w_{1} ‖^{2} + v_{1} \sum_{i = 1}^{n} w_{1}^{t} x_{i} + c_{1} \sum_{i = 1}^{n} ξ_{1 i}

(13a)

subject to

w_{1}^{t} x_{i} + ξ_{1 i} \geq y_{i} + e_{i}

(13b)

ξ_{1 i} \geq 0

(13c)

2 . min \frac{1}{2} ‖ w_{2} ‖^{2} - v_{2} \sum_{i = 1}^{n} w_{2}^{t} x_{i} + c_{2} \sum_{i = 1}^{n} ξ_{2 i}

(14a)

subject to

y_{i} + e_{i} + ξ_{2 i} \geq w_{2}^{t} x_{i}

(14b)

{\bar{Y}}_{i}^{*} - w_{2}^{t} x_{i} \geq 0

(14c)

ξ_{2 i} \geq 0

(14d)

3 . min \frac{1}{2} ‖ w_{3} ‖^{2} + v_{3} \sum_{i = 1}^{n} w_{3}^{t} x_{i} + c_{3} \sum_{i = 1}^{n} ξ_{3 i}

(15a)

subject to

w_{3}^{t} x_{i} + ξ_{3 i} \geq y_{i} - e_{i}

(15b)

{\bar{Y}}_{* i} - w_{3}^{t} x_{i} \geq 0

(15c)

ξ_{4 i} \geq 0

(15d)

4 . min \frac{1}{2} ‖ w_{4} ‖^{2} - v_{4} \sum_{i = 1}^{n} w_{4}^{t} x_{i} + c_{4} \sum_{i = 1}^{n} ξ_{4 i}

(16a)

subject to

y_{i} - e_{i} + ξ_{4 i} \geq w_{4}^{t} x_{i}

(16b)

{\underline{Y}}_{* i} - w_{4}^{t} x_{i} \geq 0

(16c)

ξ_{4 i} \geq 0

(16d)

where $v_{1}$ , $v_{2}$ , $v_{3}$ , and $v_{4}$ are the trade-off between the flatness of upper model or lower model and the estimated value of them. Minimizing $\sum_{i = 1}^{n} w_{1}^{t} x_{i}$ and $\sum_{i = 1}^{n} w_{3}^{t} x_{i}$ makes high values of upper and lower model obtain the minimum numerical values which satisfy the constraint conditions. Correspondingly, maximizing $\sum_{i = 1}^{n} w_{2}^{t} x_{i}$ and $\sum_{i = 1}^{n} w_{4}^{t} x_{i}$ makes low values of upper model and lower model obtain the maximum numerical values satisfied with the constraint conditions. Similar to Hao’s approach, the slack variables $ξ_{i 1}$ , $ξ_{i 2}$ , $ξ_{i 3}$ , and $ξ_{i 4}$ measure the degree of the inclusion condition being violated. Constraints (14c), (15c), and (16c) are added to ensure $Y_{*} (x_{i}) \subseteq Y (x_{i}) \subseteq Y^{*} (x_{i})$ . Then, we can construct the Lagrange functions as follows

\begin{matrix} L_{1} = \frac{1}{2} ‖ w_{1} ‖^{2} + v_{1} \sum_{i = 1}^{n} w_{1}^{t} x_{i} + c_{1} \sum_{i = 1}^{n} ξ_{1 i} \\ - \sum_{i = 1}^{n} a_{1 i} (w_{1}^{t} x_{i} + ξ_{1 i} - y_{i} - e_{i}) - \sum_{i = 1}^{n} γ_{1 i} ξ_{1 i} \end{matrix}

(17)

\begin{matrix} L_{2} = \frac{1}{2} ‖ w_{2} ‖^{2} - v_{2} \sum_{i = 1}^{n} w_{2}^{t} x_{i} + c_{2} \sum_{i = 1}^{n} ξ_{2 i} \\ - \sum_{i = 1}^{n} a_{2 i} (y_{i} + e_{i} + ξ_{2 i} - w_{2}^{t} x_{i}) - \sum_{i = 1}^{n} a_{5 i} ({\bar{Y}}_{i}^{*} - w_{2}^{t} x_{i}) \\ - \sum_{i = 1}^{n} γ_{2 i} ξ_{i 2} \end{matrix}

(18)

\begin{matrix} L_{3} = \frac{1}{2} ‖ w_{3} ‖^{2} + v_{3} \sum_{i = 1}^{n} w_{3}^{t} x_{i} + c_{3} \sum_{i = 1}^{n} ξ_{3 i} \\ - \sum_{i = 1}^{n} a_{3 i} (w_{3}^{t} x_{i} + ξ_{3 i} - y_{i} + e_{i}) - \sum_{i = 1}^{n} a_{6 i} ({\bar{Y}}_{* i} - w_{3}^{t} x_{i}) \\ - \sum_{i = 1}^{n} γ_{3 i} ξ_{i 3} \end{matrix}

(19)

\begin{matrix} L_{4} = \frac{1}{2} ‖ w_{4} ‖^{2} - v_{4} \sum_{i = 1}^{n} w_{4}^{t} x_{i} + c_{4} \sum_{i = 1}^{n} ξ_{4 i} \\ - \sum_{i = 1}^{n} a_{4 i} (y_{i} - e_{i} + ξ_{4 i} - w_{4}^{t} x_{i}) - \sum_{i = 1}^{n} a_{7 i} ({\underline{Y}}_{* i} - w_{4}^{t} x_{i}) \\ - \sum_{i = 1}^{n} γ_{4 i} ξ_{4 i} \end{matrix}

(20)

where $a_{i} i = 1, \dots, 7$ are the Lagrange multipliers. According to optimization theory, the partial derivatives of concerning the primal variables are all zeros. We obtain

\frac{\partial J_{1}}{\partial w_{1}} = 0 \to w_{1} = \sum_{i = 1}^{n} (a_{i 1} - v_{1}) x_{i}

(21)

\frac{\partial J_{2}}{\partial w_{2}} = 0 \to w_{2} = \sum_{i = 1}^{n} (v_{2} - a_{2 i} - a_{5 i}) x_{i}

(22)

\frac{\partial J_{3}}{\partial w_{3}} = 0 \to w_{3} = \sum_{i = 1}^{n} (a_{3 i} - a_{6 i} - v_{3}) x_{i}

(23)

\frac{\partial J_{4}}{\partial w_{4}} = 0 \to w_{4} = \sum_{i = 1}^{n} (v_{4} - a_{4 i} - a_{7 i}) x_{i}

(24)

\frac{\partial J_{j}}{\partial ξ_{j}} = 0 \to c_{j} - a_{ij} - γ_{ij} = 0 j = 1, \dots, 4

(25)

By the dual theorem, substituting equations (21)–(25) into equations (17)–(20), we can obtain the dual problem of equations (13)–(16) as

\begin{matrix} 1 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (a_{i 1} - v_{1}) (a_{j 1} - v_{1}) x_{i}^{t} x_{j} + \sum_{i = 1}^{n} a_{i 1} (y_{i} + e_{i}) \\ subject to 0 \leq a_{i 1} \leq c_{1} \end{matrix}

(26)

\begin{matrix} 2 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (v_{2} - a_{2 i} - a_{5 i}) (v_{2} - a_{2 i} - a_{5 i}) x_{i}^{t} x_{j} \\ - \sum_{i = 1}^{n} a_{2 i} (y_{i} + e_{i}) - \sum_{i = 1}^{n} a_{5 i} {\bar{Y}}_{i}^{*} \\ subject to 0 \leq a_{2 i} \leq c_{2}, a_{5 i} \geq 0 \end{matrix}

(27)

\begin{matrix} 3 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (a_{3 i} - a_{6 i} - v_{3}) (a_{3 i} - a_{6 i} - v_{3}) x_{i}^{t} x_{j} \\ - \sum_{i = 1}^{n} a_{3 i} (- y_{i} + e_{i}) - \sum_{i = 1}^{n} a_{6 i} {\bar{Y}}_{* i} \\ subject to 0 \leq a_{3 i} \leq c_{3}, a_{6 i} \geq 0 \end{matrix}

(28)

\begin{matrix} 4 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (v_{4} - a_{4 i} - a_{7 i}) (v_{4} - a_{4 i} - a_{7 i}) x_{i}^{t} x_{j} \\ - \sum_{i = 1}^{n} a_{4 i} (y_{i} - e_{i}) - \sum_{i = 1}^{n} a_{7 i} {\underline{Y}}_{* i} \\ subject to 0 \leq a_{4 i} \leq c_{4}, a_{7 i} \geq 0 \end{matrix}

(29)

Solving the above four QPPs, we obtain the value of the Lagrange multipliers $a_{i} i = 1, \dots, 7$ , which give the weight vector as a linear combination of $x_{i}$ by equations (9)–(12).

Interval nonlinear regression

The proposed algorithm can be extended to a nonlinear interval regression model by employing kernel function, which donates as $K (x_{i}, x_{j}) = 〈 ϕ ({x_{i}}^{T}) • ϕ (x_{j}) 〉$ , mapping training samples into a high-dimensional feature space using a nonlinear transform. Then, nonlinear interval regression $Φ : R^{d} \to f$ model is estimated by solving the following four QPPs

\begin{matrix} 1 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (a_{i 1} - v_{1}) (a_{j 1} - v_{1}) K (x_{i}, x_{j}) \\ + \sum_{i = 1}^{n} a_{i 1} (y_{i} + e_{i}) \\ subject to 0 \leq a_{i 1} \leq c_{1} \end{matrix}

(30)

\begin{matrix} 2 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (v_{2} - a_{2 i} - a_{5 i}) (v_{2} - a_{2 i} - a_{5 i}) K (x_{i}, x_{j}) \\ - \sum_{i = 1}^{n} a_{2 i} (y_{i} + e_{i}) - \sum_{i = 1}^{n} a_{5 i} {\bar{Y}}_{i}^{*} \\ subject to 0 \leq a_{2 i} \leq c_{2}, a_{5 i} \geq 0 \end{matrix}

(31)

\begin{matrix} 3 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (a_{3 i} - a_{6 i} - v_{3}) (a_{3 i} - a_{6 i} - v_{3}) K (x_{i}, x_{j}) \\ - \sum_{i = 1}^{n} a_{3 i} (- y_{i} + e_{i}) - \sum_{i = 1}^{n} a_{6 i} {\bar{Y}}_{* i} \\ subject to 0 \leq a_{3 i} \leq c_{3}, a_{6 i} \geq 0 \end{matrix}

(32)

\begin{matrix} 4 . max - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} (v_{4} - a_{4 i} - a_{7 i}) (v_{4} - a_{4 i} - a_{7 i}) K (x_{i}, x_{j}) \\ - \sum_{i = 1}^{n} a_{4 i} (y_{i} - e_{i}) - \sum_{i = 1}^{n} a_{7 i} {\underline{Y}}_{* i} \\ subject to 0 \leq a_{4 i} \leq c_{4}, a_{7 i} \geq 0 \end{matrix}

(33)

It is worthy to note that the kernel function extends the application scope of the proposed model to deal with nonlinear interval regression, but it does not increase the time complexity of the training model. Using different kernel functions, training samples can be mapped into different high-dimensional feature spaces. Thus, the upper model $Y_{i}^{*}$ and the lower model $Y_{i}^{*}$ constructed using the proposed approach are computed as follows:

Nonlinear upper model

{\bar{Y}}_{i}^{*} = \sum_{i = 1}^{n} (a_{i 1} - v_{1}) K (x_{i}, x_{j})

(34)

{\underline{Y}}_{i}^{*} = \sum_{i = 1}^{n} (v_{2} - a_{i 2}) K (x_{i}, x_{j})

(35)

Nonlinear lower model

{\bar{Y}}_{* i} = \sum_{i = 1}^{n} (a_{3 i} - a_{6 i} - v_{3}) K (x_{i}, x_{j})

(36)

{\underline{Y}}_{* i} = \sum_{i = 1}^{n} (v_{4} - a_{4 i} - a_{7 i}) K (x_{i}, x_{j})

(37)

Experiments

In this section, we use three artificial datasets and a power load dataset to verify the regression performance of the proposed approach compared with Hong’s quadratic loss SVM and Hao’s two smaller QPP approach. All regression methods are implemented in MATLAB on Windows 7 running on a personal computer (PC). Gaussian kernel is employed because of its good performance in the interval regression field. Performances of these methods seriously depend on the choices of parameters. Thus, the best parameters for these algorithms are selected by the leave-one-out (LOO) cross-validation and a grid search approach.

We consider the following criteria for algorithm evaluation. The goodness of fitness $φ_{Y} (x_{i})$ for the ith data sample is defined as

φ_{fit} (x_{i}) = \frac{R_{Y_{* i}}}{R_{Y_{i}^{*}}}

where $R_{Y_{* i}}$ is the spread of the lower model and can be calculated by ${\bar{Y}}_{* i} - {\underline{Y}}_{* i}$ , and $R_{Y_{i}^{*}}$ is the spread of the upper model which is calculated by ${\bar{Y}}_{i}^{*} - {\underline{Y}}_{i}^{*}$ . The index $φ_{Y} (x_{i})$ indicates how closely the lower output approximates the upper output for the ith input. Correspond-ingly, measure of fitness for all data $φ_{Y} (x)$ is defined as

φ_{fit} (x) = \frac{1}{n} \sum_{i = 1}^{n} \frac{R_{Y_{* i}}}{R_{Y_{i}^{*}}}

where n is the sample size and $0 \leq φ_{fit} \leq 1$ . The larger the value of $φ_{fit}$ is, the better the model is fitting to the data. The measure of the vagueness for the estimated regression model is defined as

φ_{vague} (x) = \frac{1}{n} \sum_{i = 1}^{n} R_{Y_{i}^{*}}

The larger is the value of $φ_{vague}$ , the vaguer is the obtained regression model. The measures of the sum squared error (SSE) for the estimated upper regression model and lower regression model are defined as

\begin{matrix} φ_{Y^{*}, SSE} (x) = \frac{1}{n} \sum_{i = 1}^{n} ‖ Y_{i}^{*} - Y_{i} ‖_{H}^{2} \\ φ_{Y_{*}, SSE} (x) = \frac{1}{n} \sum_{i = 1}^{n} ‖ Y_{* i} - Y_{i} ‖_{H}^{2} \end{matrix}

where $‖ • ‖_{H}^{2}$ is the square of the Hausdorff distance between two intervals which is defined as

‖ u - v ‖_{H} = \sqrt{max {{(u_{low} - v_{low})}^{2}, {(u_{hign} - v_{high})}^{2}}}

Hao’s dataset with heteroscedastic uncertainty structure

The first dataset is used in the validity of Hao’s two smaller QPP approach, which is a synthetic dataset with heteroscedastic uncertainty structure. Spreads of fuzzy output in this dataset depend strongly on the input values, which are generated by

\begin{matrix} x_{i} = 0.02 (i - 1) i = 1, \dots, 51 \\ y_{i} = {(2.7 x_{i} - 0.2)}^{2} + 4.5 + er r_{i} [- n, n] \\ e_{i} = 1.7 \exp (- 49 {(x_{i} - 0.5)}^{2}) + 1.7 x_{i} + 1.2 \end{matrix}

where the noise $er r_{i} [- n, n]$ is drawn from a uniform distribution in the interval [−0.4, 0.4].

Figure 1 shows the upper model and the lower model that are respectively estimated by Hong’s quadratic loss SVM approach, Hao’s two smaller QPP approach, and the proposed approach. The outer two solid curves represent the estimated values of the upper model, and the inner two dashed curves represent the estimated values of the lower model. As shown in Figure 1, the three approaches can estimate better regression values of interval data, from both outside and inside. But the spread of the upper model of Hong’s is widest and the spread of the lower model is narrowest among three approaches. The regression performance of Hao’s method is better than Hong’s method. It has narrower spread of upper model and wider boundary of the lower model. But $Y (x_{i})$ of the third, the fourth and the sixth data do not locate between the upper model $Y^{*}$ and the lower model $Y_{*}$ , because the slack variables $ξ_{3 i}$ and $ξ_{3 i}^{*}$ in Hao’s method can be not zeros. If $C_{4}$ takes a larger value in equation (8) and thus all training data can locate between the upper model and the lower model, but the regression values will be similar to those of Hong’s approach. The distance of the proposed approach is the smallest between the regression values of the upper model and the lower model, and no data violate constraints.

Figure 1.

The fuzzy output estimated by (a) Hong’s quadratic loss SVM, (b) Hao’s two smaller QPP approach, and (c) the proposed approach on the Hao’s dataset.

Table 1 reports a comparison of regression performance for Hong’s quadratic loss SVM method, Hao’s two smaller QPP method, and the proposed method. The true risk of the obtained upper model and the lower regression model is evaluated with the true center function without noise. Then, ${Y_{i}}^{'} = ({y_{i}}^{'}, e_{i})$ which is defined as

\begin{matrix} {y_{i}}^{'} = {(2.7 x_{i} - 0.2)}^{2} + 4.5 \\ e_{i} = 1.7 \exp (- 49 {(x_{i} - 0.5)}^{2}) + 1.7 x_{i} + 1.2 \end{matrix}

Table 1.

A comparison of the regression performance of Hong’s approach, Hao’s approach, and the proposed approach on the Hao’s dataset.

Method	$φ_{fit} (x)$	$φ_{vague} (x)$	$φ_{Y^{*}, SSE} (x)$	$φ_{Y_{*}, SSE} (x)$	$φ_{Y^{*}, TSSE} (x)$	$φ_{Y_{*}, TSSE} (x)$	Training time (s)
Hong’s approach	0.6308	3.0614	0.7563	0.7569	0.6372	0.6378	7.4257
Hao’s approach	0.7625	2.7506	0.4204	0.5770	0.3408	0.4974	4.2894
Proposed approach	0.8183	2.7209	0.3916	0.3998	0.3003	0.3034	0.4992

Thus, the true sum squared error (TSSE) for the estimated upper regression model and lower regression model is defined as

\begin{matrix} φ_{Y^{*}, TSSE} (x) = \frac{1}{n} \sum_{i = 1}^{n} ‖ Y_{i}^{*} - {Y_{i}}^{'} ‖_{H}^{2} \\ φ_{Y_{*}, TSSE} (x) = \frac{1}{n} \sum_{i = 1}^{n} ‖ Y_{* i} - {Y'}_{i} ‖_{H}^{2} \end{matrix}

As shown in Table 1, three methods have good generalization ability when $y_{i}$ has noise because three approaches implement the principle of structural risk minimization. But the training time of the proposed approach is seen to be significantly less than that of Hong’s quadratic loss SVM and Hao’s two smaller QPP approach. Hong’s approach is estimated by solving a larger QPP with 7n variables, so the training time complexity for Hong’s approach is $(7 n)^{3}$ , where n is the number of training samples. Hao’s approach is estimated by solving two smaller QPPs with 4n and 3n variables. So the training time complexity for Hao’s approach is $(4 n)^{3} + (3 n)^{3}$ . On the contrary, the proposed approach is estimated by solving four smaller QPPs with n, 2n, 2n, and 2n variables; thus, the training time complexity for the proposed approach is $(n)^{3} + (2 n)^{3} + (2 n)^{3} + (2 n)^{3}$ . Therefore, the training speed of the proposed approach is faster than Hong’s approach and Hao’s approach.

Hao’s dataset with outliers

It is known that many regression models for the interval data are strongly influenced by outliers. To verify the regression performance of three approaches on a dataset with outliers, a few data be replaced by outliers in Hao’s dataset to be the second example. As illustrated in Figure 2, Hong’s approach is most affected by outliers among three methods because there is not any slack variable during the modeling process. So each data of training dataset must be located between the upper model and the lower model no matter what it is an outlier or not. Hao’s approach and the proposed approach can reduce the influence of outliers. But the boundary of two approaches tends to outliers, which makes the regression performance decline. As shown in Table 2, each criterion gets worse than those in Table 1.

Figure 2.

The fuzzy output estimated by (a) Hao’s two smaller QPP approach, (b) Hong's quadratic loss SVM, and (c) the proposed approach on the Hao’s dataset with outliers.

Table 2.

A comparison of the regression performance of Hong’s approach, Hao’s approach, and the proposed approach on the Hao’s dataset with outliers.

Method	$φ_{fit} (x)$	$φ_{vague} (x)$	$φ_{Y^{*}, SSE} (x)$	$φ_{Y_{*}, SSE} (x)$	$φ_{Y^{*}, TSSE} (x)$	$φ_{Y_{*}, TSSE} (x)$	Training time (s)
Hong’s approach	0.5518	3.6773	1.4415	0.8778	1.3667	0.6892	6.4584
Hao’s approach	0.7081	2.6947	0.4544	0.8486	0.3584	0.7001	4.6382
Proposed approach	0.8039	2.7873	0.5587	0.5454	0.3271	0.3934	0.5928

Asymmetrical interval dataset

When the center and radius of interval data contain error, the error range of the high value and the low value of interval ends is the same. But the situation is common in practice that the error ranges of the high value and the low value of interval ends are different, which donate as asymmetrical interval dataset. For the third example, the regression performance of three algorithms is verified on the asymmetrical interval dataset which is taken from Xu et al.⁴⁷ and is generated by

\begin{matrix} y_{i}^{R} = 13.5 x_{i} + 0.3 + x_{i} er r_{i} [- 2, 2] \\ y_{i}^{L} = 13.5 x_{i} + 6.3 + x_{i} er r_{i} [- 1, 1] \\ x_{i} = 0.04 (i + 1) i = 1, \dots, 20 \end{matrix}

where $er r_{i} [- n, n]$ represents an error randomly generated in the interval [−n, n].

For nonlinear regression problems, all the regression methods based on SVM map input data to high-dimensional feature space, in order to make linear regression in this feature space. Therefore, in essence, all input data will be linear regression in a certain space by the regression approach based on SVM, no matter what they are linear relationship or nonlinear relationship. Thus, to clearly show the regression performance in the asymmetric interval dataset, linear regression was employed by the three approaches. The results of this example are shown in Figure 3.

Figure 3.

The fuzzy output estimated by (a) Hong’s quadratic loss SVM, (b) Hao’s two smaller QPP approach, and (c) the proposed approach on the asymmetrical interval dataset.

As shown in Figure 3, Hong’s quadratic loss SVM approach and Hao’s two smaller QPP approach have similar regression outcome. During the regression estimations are calculated, the values of upper model and lower model were obtain by estimated center plus or minus estimated radius in above two regression approaches. That make results of them cannot truly reflect the distribution trend of the asymmetrical training dataset. However, the distance of high value of the proposed approach between the upper model and lower model is smaller than the low value of them. Because the estimated values of the proposed approach are calculated by four independent regression models, which make the scope of application of the proposed approach wider. As shown in Table 3, SSE of the upper model and lower model of the proposed approach is smaller than that of Hong’s approach and Hao’s approach, which makes $φ_{fit} (x)$ and $φ_{vague} (x)$ of the proposed approach better than the other approaches.

Table 3.

A comparison of the regression performance of Hong’s approach, Hao’s approach, and the proposed approach on the asymmetrical interval dataset.

Approach	$φ_{fit} (x)$	$φ_{vague} (x)$	$φ_{Y^{*}, SSE} (x)$	$φ_{Y_{*}, SSE} (x)$	Training time (s)
Hong’s approach	0.6610	3.5960	0.7456	0.8625	2.3244
Hao’s approach	0.6213	3.5408	0.6886	1.0224	0.3588
Proposed approach	0.7498	3.5103	0.6508	0.6958	0.2184

Power load dataset

This article selects the load data of a Chinese power company from 1 January 2014 to 30 December 2016, as the forecast time series and predictions are made in days. The same can be done in units of months. However, the larger the time span, the larger the radius of the interval data obtained by the load forecast, and the greater the volatility brought by it, which will affect the forecast results. The data from the first 2 years are used to build a prediction model, and the data from the last year are used to test the prediction results. As shown in Table 4, the results indicate that the proposed approach obtains better estimated performance than the other approach. As the size of the training data increases, the training time of the model also increases significantly in Hong’s approach and Hao’s approach, and the training time advantage of the proposed algorithm is more obvious.

Table 4.

A comparison of the regression performance of Hong’s approach, Hao’s approach, and the proposed approach on the power load dataset.

Approach	$φ_{fit} (x)$	$φ_{vague} (x)$	$φ_{Y^{*}, SSE} (x)$	$φ_{Y_{*}, SSE} (x)$	Training time (s)
Hong’s approach	0.3812	33.3338	63.5155	42.8408	56.6404
Hao’s approach	0.3750	29.2223	16.6285	15.9338	21.1875
Proposed approach	0.7166	24.9941	10.9171	10.3343	5.2188

Conclusion

This article focuses on dual possibilistic regression models of interval data which is widely used to represent power load. In the spirit of Twin-SVM, ITSVR, and two smaller QPP approach, the regression models are estimated by solving two smaller SVM-type QPPs rather than a single large QPP; therefore, the training speed significantly increases. An efficient interval regression approach by SVM based on solving a group of small QPPs is proposed. This proposed approach employs four small SVM-type QPPs to estimate upper and lower regression models. Each small SVM-type QPP contains fewer optimization variables than Hao’s approach and Hong’s approach, which makes training time of proposed approach less than them. Because slack variables are used in the modeling process of the proposed approach and Hao’s approach, the training data can violate the modeling constraints, so that the two approaches can deal with the dataset including noise and outliers. The proposed approach obtains four nonparallel functions such that one of functions determines one of inside or outside boundaries of of the interval data, and thus the proposed approach can better be regression of the asymmetric interval data. However, the models proposed in this article have more parameters, and the regression effect is seriously affected by the parameter values. So researching intelligent optimization methods to find the optimal parameter value is our next work.

Footnotes

Handling Editor: Xiaojiang Du

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Guangdong Province Key Area R&D Program of China under grant no. 2019B010137004; the National Natural Science Foundation of China under grant nos U1636215, 61871140, and 61972 108; and the National Key research and Development Plan under grant no. 2018YFB0803504.

ORCID iD

Hui Lu

References

Chen

. Security in wireless sensor networks. IEEE Wirel Commun Mag 2008; 15(4): 60–66.

Xiao

Guizani

, et al. An effective key management scheme for heterogeneous sensor networks. Ad Hoc Netw 2007; 5(1): 24–34.

Guizani

Xiao

, et al. A routing-driven elliptic curve cryptography based key management scheme for heterogeneous sensor networks. IEEE T Wirel Commun 2009; 8(3): 1223–1229.

Sun

, et al. Deep reinforcement learning for partially observable data poisoning attack in crowdsensing systems. IEEE Internet Things. Epub ahead of print 30 December 2019. DOI: 10.1109/JIOT.2019.2962914.

Xiao

Huang

, et al. Cloud-based malware detection game for mobile devices with offloading. IEEE T Mobile Comput 2017; 16(10): 2742–2750.

Tian

Qiu

, et al. A data leakage prevention method based on the reduction of confidential and context terms for smart mobile devices. Wirel Commun Mob Com 2018; 2018: 5823439.

Tian

Qiu

, et al. Block-DEF: a secure digital evidence framework using blockchain. Inform Sci 2019; 491: 151–165.

Tian

Shi

, et al. A data-driven model for future Internet route decision modeling. Fut Gener Comp Syst 2019; 95: 212–220.

Tian

Gao

, et al. A novel reputation framework for identifying denial of traffic service in internet of connected vehicles. IEEE Internet Things. Epub ahead of print 5 November 2019. DOI: 10.1109/JIOT.2019.2951620.

10.

Dong

Zhang

, et al. A detection method for a novel DDoS attack against SDN controllers by vast new low-traffic flows. In: Proceedings of the IEEE ICC 2016, Kuala Lumpur, Malaysia, 22–27 May 2016. New York: IEEE.

11.

Qiu

Zhang

, et al. Nei-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for smart city. IEEE T Ind Inform 2019; 16(4): 2659–2666.

12.

Tian

Luo

Qiu

, et al. A distributed deep learning system for web attack detection on edge devices. IEEE T Ind Inform 2020; 16(3): 1963–1971.

13.

Qiu

Tian

, et al. A survey on access control in the age of internet of things. IEEE Internet Things. Epub ahead of print 24 January 2020. DOI: 10.1109/JIOT.2020.2969326.

14.

Xiao

Wan

Dai

, et al. Security in mobile edge caching with reinforcement learning. IEEE Wirel Commun 2018; 25(3): 116–122.

15.

Wang

, et al. An out-of-band authentication scheme for internet of things using blockchain technology. In: Proceedings of IEEE ICNC 2018, Maui, HI, 5–8 March 2018. New York: IEEE.

16.

Qiu

Yang

, et al. An graph-based adaptive method for fast detection of transformed data leakage in IOT via WSN. IEEE Access 2019; 7: 137111–137121.

17.

Tian

Shi

Wang

, et al. Real time lateral movement detection based on evidence reasoning network for edge computing environment. IEEE T Ind Inform 2019; 15(7): 4285–4294.

18.

Jiang

Gupta

, et al. Deep learning based multichannel intelligent attack detection for data security. IEEE T Sustain Comput. Epub ahead of print 15 January 2018. DOI: 10.1109/TSUSC.2018.2793284.

19.

Shen

Zhu

, et al. Cloud-based approximate constrained shortest distance queries over encrypted graphs with privacy protection. IEEE T Inf Foren Sec 2018; 13(4): 940–953.

20.

Tian

Qiu

, et al. An intrusion detection algorithm based on feature graph. Comput Mater Con 2019; 61(1): 259–273.

21.

Xiao

Zhang

, et al. Internet protocol television (IPTV): the killer application for the next generation Internet. IEEE Commun Mag 2007; 45(11): 126–134.

22.

Chapaloglou

Nesiadis

Iliadis

, et al. Smart energy management algorithm for load smoothing and peak shaving based on load forecasting of an island’s power system. Appl Energ 2019; 238: 627–642.

23.

Zhou

Jin

. Holographic ensemble forecasting method for short-term power load. IEEE T Smart Grid 2019; 10(1): 425–434.

24.

Liu

. A least squares support vector machine model optimized by moth-flame optimization algorithm for annual power load forecasting. Appl Intell 2016; 45(5): 1166–1178.

25.

Wan

Liu

. Application of interval time-series vector autoregressive model in short-term load forecasting. Power Syst Tech 2012; 36(11): 77–81.

26.

Diamond

. Fuzzy least squares. Inform Sciences 1988; 46(3): 141–157.

27.

Celmins

. Least squares model fitting to fuzzy vector data. Fuzzy Set Syst 1987; 22(3): 245–269.

28.

D’Urso

Santoro

. Goodness of fit and variable selection in the fuzzy multiple linear regression. Fuzzy Set Syst 2006; 157(19): 2627–2647.

29.

Tseng

. A new approach to fuzzy regression models with application to business cycle analysis. Fuzzy Set Syst 2002; 130: 33–42.

30.

Tanaka

Lee

. Interval regression analysis by quadratic programming approach. IEEE T Fuzzy Syst 1998; 6(4): 473–481.

31.

Guo

Tanaka

. Dual models for possibilistic regression analysis. Comput Stat Data Anal 2006; 51: 253–266.

32.

Ishibuchi

Tanaka

. Fuzzy regression analysis using neural networks. Fuzzy Set Syst 1992; 50: 57–65.

33.

Hashiyama

Furuhash

Uchikawa

. An interval fuzzy model using a fuzzy neural network. In: IEEE international conference neural networks, Baltimore, MD, 7–11 June 1992, pp.745–750. New York: IEEE.

34.

Ishibuchi

Tanaka

Okada

. An architecture of neural networks with interval weights and its application to fuzzy regression analysis. Fuzzy Set Syst 1993; 57: 27–39.

35.

Ishibuchi

Nii

. Fuzzy regression using asymmetric fuzzy coefficients and fuzzified neural networks. Fuzzy Set Syst 2001; 119: 273–290.

36.

Cheng

Lee

. Fuzzy regression with radial basis function network. Fuzzy Set Syst 2001; 119: 291–301.

37.

Tanaka

Uejima

Asai

. Linear regression analysis with fuzzy model. IEEE T Syst Man Cyb 1982; 12(6): 903–907.

38.

Tanaka

. Fuzzy data analysis by possibilistic linear models. Fuzzy Set Syst 1987; 24: 363–375.

39.

Tanaka

Watada

. Possibilistic linear systems and their application to the linear regression model. Fuzzy Set Syst 1988; 27(3): 275–289.

40.

Tanaka

Hayashi

Watada

. Possibilistic linear regression analysis for fuzzy data. Euro J Oper Res 1989; 40(3): 389–396.

41.

Vapnik

. The nature of statistical learning theory. Berlin: Springer-Verlag, 1995.

42.

Hong

Hwang

. Interval regression analysis using quadratic loss support vector machine. IEEE T Fuzzy Syst 2005; 13: 229–237.

43.

Hwang

Hong

Seok

. Support vector interval regression machine for crisp input and output data. Fuzzy Set Syst 2006; 157(8): 1114–1125.

44.

Jeng

Chuang

. Support vector interval regression networks for interval regression analysis. Fuzzy Set Syst 2003; 138: 283–300.

45.

Chuang

. Extended support vector interval regression networks for interval input–output data. Inform Sci 2008; 178(3): 871–891.

46.

Hao

. Interval regression analysis using support vector networks. Fuzzy Set Syst 2009; 160(17): 2466–2485.

47.

Luo

, et al. Asymmetrical interval regression using extended ε-svm with robust algorithm. Fuzzy Set Syst 2009; 160(7): 988–1002.

48.

Jayadeva Khemchandani

Chandra

. Twin support vector machines for pattern classification. IEEE T Pattern Anal 2007; 29(5): 905–910.

49.

Peng

Chen

Kong

, et al. Interval twin support vector regression algorithm for interval input-output data. Int J Mach Learn Cyb 2015; 6(5): 719–732.

50.

Hao

. Dual possibilistic regression analysis using support vector networks. Fuzzy Set Syst 2020; 387: 1–34.