Sage Journals: Discover world-class research

Abstract

As an important research direction for the future innovation and development of the energy industry, integrated energy system (IES) requires short-term load forecasting as the decision-making basis. A short-term load forecasting model based on improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) maximum information coefficient (MIC) transformer is proposed to address the problem of high volatility and strong randomness in IES short-term load. The model effectively reduces the complexity of time series and alleviates the impact of modal aliasing by ICEEMDAN. Perform correlation analysis and reconstruction of the decomposed intrinsic mode function using MIC to improve the coupling relationship between multivariate loads. Finally, Transformer model based on self-attention mechanism is used to predict each component and obtain the final prediction result. The horizontal comparison experiment and vertical ablation experiment were conducted on the model using the IES dataset from Arizona State University in the United States. The results showed that the predictive performance of the model was improved to a certain extent and it had a certain degree of application prospects.

Keywords

Integrated energy system short-term load forecasting improved complete ensemble empirical mode decomposition with adaptive noise maximum information coefficient transformer

Introduction

In recent years, new energy technology has continued to progress and iterative updating. A new model of energy consumption, with renewable and clean energy as an important guide and key benchmark, is also being gradually formed and improved (Lilhore et al., 2025; Simaiya et al., 2024; Sunder et al., 2024). Against the background of abundant energy supply and diversified energy demand, the concept of integrated energy system (IES) has been proposed and has become a research hotspot in the energy field in recent years (Chowdary et al., 2025). Overall, IES takes the electric power system as the core, utilizes various types of energy equipment with complex functions, and breaks down the interaction barriers between different energy systems. IES realizes integrated planning and collaborative management on the whole, and optimized operation and coordination and stability on the local level (Alofaysan et al., 2024). Specifically, for the energy demand of users in production and life of cold, heat, gas and electricity, the energy supplier adjusts the energy planning and supply proportion according to the target macro and micro factors, so as to achieve rationalization, high efficiency and coordinated scheduling of regional energy. Therefore, the development of IES not only puts forward higher requirements for energy generation, transmission, transformation and storage in the process of multivariate load interaction, but also poses a great challenge to the speed of energy information exchange and processing capability.

As the decision support for IES energy and communication interaction, the accuracy and response time of short-term load forecasting will have an impact on the whole IES. Unlike single load, load forecasting under IES should further consider the coupling relationship between multivariate energy sources and improve the accuracy of multi-energy load forecasting, thus enabling energy suppliers to realize the fine-grained scheduling of integrated energy sources (Gielen et al., 2019). As shown in Figure 1, by utilizing equipment such as chillers, electric heat pumps, cogeneration, combined heat and power units, gas boiler and electric gas conversion (P2G) systems, and fuel cells, energy suppliers can jointly dispatch integrated energy sources at the district level according to the multivariate energy demands of users, increasing the overall energy use efficiency and reducing unnecessary losses (Smeers et al., 2021).

Figure 1.

The interaction structure of integrated energy system (IES) energy and information.

Since the concept of IES was proposed late, most of the IES load forecasting models are based on deep learning models with strong nonlinear function fitting capability. Yujie et al. proposed a vector auto regressive based IES load forecasting method has been proposed (Li et al., 2018). They consider the correlation between electricity, gas, and cold loads, and adds temperature as an external variable for load prediction. A model for short-term dual load demand forecasting of electricity and gas has been proposed by Tang et al., which is based on radial basis function neural network (RBF-NN) (Tang et al., 2019). This model fully considers the coupling relationship of loads and strengthens the spatiotemporal correlation by adding electricity price factors. Then use the RBF-NN model to predict the two types of loads separately. Due to the increasing amount of IES data, models with larger scales and deeper layers are being applied. Xuan et al. utilized three different structures of models, namely convolutional neural network (CNN), gated recurrent unit (GRU), and gradient boosting regressor tree, to form a multivariate load forecasting model (Wang et al., 2021). Zhang et al. proposed a load forecasting method based on deep belief network (DBN) and multivariate task regression layers, which was applied to electricity, heat, and gas energy prediction. In order to extract hidden features from multivariate load time series, Zhou et al. proposed another prediction model based on DBN, which achieved net load prediction for both producers and consumers (Zhou et al., 2021).

Transformer was proposed in reference Vaswani et al. (2017) and subsequently applied in various industries. Wang et al. improved the basic model and proposed a short-term load forecasting model for IES (Wang et al., 2022). This model utilizes self-attention mechanism to effectively couple the relationship between electricity, cold, and heat loads, achieving the goal of synchronous input/output. The experimental results show that this method has better predictive performance compared to traditional methods.

In this study, we propose a IES short-term load forecasting model based on improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) maximum information coefficient (MIC) transformer that achieves the high precision synchronous prediction of short-term load time series in IES. Considering the volatility and randomness of IES short-term load time series, the signal decomposition algorithm is used to effectively decompose the original signal. As a cutting-edge algorithm in time series decomposition, ICEEMDAN has excellent performance in reducing signal complexity. For the decomposed intrinsic mode function (IMF) components, use the MIC algorithm for correlation analysis and reconstruction. The reconstructed multivariate load data is used as the input of the encoder, and the single load data is used as the input of the decoder, synchronously outputting the multivariate prediction score and obtain the final result.

The rest of this article is organized as follows. The “Methodology” section introduces the algorithms and models used in this article. The “Data analysis and index description” section provides a brief analysis and description of the IES dataset and evaluation metrics. The experimental results and analysis are presented in The “Result analysis” section. The “Conclusion” section provides a summary of the entire text.

Methodology

ICEEMDAN Algorithm

The energy use behavior of energy users changes over time, so load signals are typical time series signals. This type of time series signal possesses high complexity and randomness. A common method used by researchers to effectively reduce this type of complexity and randomness is to split the original signal into several IMFs using decomposition methods. As a classical adaptive decomposition method, empirical modal decomposition (EMD) is good at handling nonlinear and nonsmooth data (Wang et al., 2022). Therefore, EMD has also been applied in load forecasting studies. It is worth mentioning that during the iteration process of the EMD algorithm, the IMF component will be affected by modal mixing, which leads to spurious time-frequency distributions, thus making the IMF lose its proper physical meaning.

Some subsequent researchers have improved the EMD algorithm. One proposed ensemble empirical mode decomposition (EEMD) by adding white noise that conforms to the normal distribution of the original signal, which reduces the error caused by mode mixing (Deng et al., 2020). However, this algorithm suffers from large reconstruction error and poor decomposition completeness with a limited number of ensemble averages. To solve this problem, Lai et al. proposes the complete EEMD algorithm with adaptive white noise (Lai et al., 2018). Compared with the previous types of hierarchical algorithms, the ICEEMDAN algorithm proposed by man effectively reduces the noise of the components and enhances their physical significance. ICEEMDAN algorithm process is as follows.

step 1

Add $E_{1} (ω^{(1)})$ to the raw load signal $f$ to construct a new load signal:

F^{(i)} = f + β_{0} E_{1} (ω^{(i)})

(1)

Where $β_{0} = ϵ_{0}$ std $(f)$ /std $(E_{1} (ω^{(i)}))$ , which is the noise coefficient, std $(\cdot)$ is the standard deviation operator, $E_{i} (\cdot)$ is the $i$ th mode obtained from EMD algorithm, $ω^{(i)}$ is the $i$ th group of white noise that follows the standard normal distribution, $ϵ_{0}$ is the magnitude of white noise.

step 2

Calculate the local mean by the EMD algorithm to obtain the residual $R_{1}$ of the first decomposition:

R_{1} = AVE (N (F^{(i)}))

(2)

Where AVE is the average value calculation, $N (\cdot)$ is the local average of the solution signal.

step 3

Calculate the first component IMF $_{1}$ through the raw load signal and the residual of the first decomposition:

{IMF}_{1} = f - R_{1}

(3)

step 4

Add I group of Gaussian white noise to the residual of the first decomposition, construct a new signal $R_{1} + β_{1} E_{2} (ω^{(i)})$ to be decomposed, and the residual $R_{2}$ of the second decomposition can be computed. Further, calculate the second component IMF $_{2}$ :

{IMF}_{2} = R_{1} - R_{2} = R_{1} - AVE (N (R_{1} + β_{1} E_{2} (ω^{(i)})))

(4)

step 5

Compute the residual $R_{i}$ of the $i$ th decomposition:

R_{i} = AVE (N (R_{i - 1} + β_{i - 1} E_{i} (ω^{(i)})))

(5)

step 6

Calculate the $i$ th component IMF $_{i}$ :

{IMF}_{i} = R_{i - 1} - R_{i}

(6)

step 7

Go back to Step 5. To calculate the next $i$ value until the decomposition stops when the residual term is less than the two local extrema values.

MIC Algorithm

Correlation analysis is a crucial method in the field of data science that aims to measure the closeness of the correlation between multivariate variables and factors, while also reflecting, to some extent, the causal relationship between multivariate variables and factors. Commonly used correlation analysis methods include Pearson Correlation Coefficient, Spearman’s Rank Correlation Coefficient, K-nearest Neighbor, and MIC (Reshef et al., 2011).

Due to the high robustness, low complexity, and standardization characteristics of MIC, it also has good correlation analysis performance for nonlinear data. Specifically, the mathematical expression of the MIC algorithm is as follows:

step 1

Assuming $X$ and $Y$ are two random variables in the dataset, where $X = {x_{1}, \dots \dots, x_{n}}$ , $Y = {y_{1}, \dots \dots, y_{n}}$ , $n$ is sample size. Define the mutual information (MI) between $X$ and $Y$ as:

I (x; y) = \int p (x, y) l o g_{2} \frac{p (x, y)}{p (x) p (y)} d x d y

(7)

where $p (x, y)$ is the joint probability density between $X$ and $Y$ , $p (x)$ and $p (y)$ represent the marginal probability density of random variables $X$ and $Y$ , respectively.

step 2

Draw a grid on the scatter plot of data composed of variables $X$ and $Y$ , and calculate the MI between each grid. Using different grid partitioning standards to select the maximum value of MI, the calculation formula is:

MIC (x; y) = \underset{a * b < B}{m a x} \frac{I (x; y)}{lo g_{2} \min (a, b)}

(8)

where $a$ , $b$ represent the number of grids divided in the $X$ and $Y$ directions, respectively. $B$ is the maximum value for grid.

Transformer model

Transformer model has gained fruitful results in fields such as natural language processing and computer vision. Compared to traditional recurrent neural network (RNN) and their variants (long short-term memory (LSTM) and GRU), Transformer solves its two inherent problems. On the one hand, the inherent structure of RNN suffers from long-term dependency issues, and when the sequence length is too long, there is a problem of data loss. On the other hand, its recursive structure and continuous dependence on local information limit the parallelization of model operation. This type of model bypasses the training recursive structure of RNN at the cost of computer computing power, thus achieving parallel operation and avoiding the problem of information loss during transmission.

Transformer model adopts an Encoder–Decoder structure. It encodes the input time series, performs a series of iterative operations, and decodes to obtain the final result. It is worth mentioning that, unlike the inherent loop structure of RNN, Transformer utilizes a self-attention mechanism that simulates human attention features for computation. Dot-product attention function can be described as mapping a query and a set of key-value pairs to an output:

{\begin{matrix} Q & = W_{q} X_{i n} \\ K & = W_{k} X_{i n} \\ V & = W_{v} X_{i n} \end{matrix}

(9)

Attention (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(10)

where $W_{q}$ , $W_{k}$ and $W_{v}$ are the linear transformation matrices of the input $X_{i n}$ . In the encoding, the IES load time series is transformed linearly to obtain $Q$ , $K$ , and $V$ . In the two attention modules of the decoder, the first module is the same as the decoder. But in the second module, $K$ and $V$ come from the hidden information matrix output by the encoder, and $Q$ comes from the output of the first multi head attention module in the current decoding layer.

Multi head attention concatenates different individual attention results through multivariate different linear transformations of $Q$ , $K$ , and $V$ . Multi head attention enables the model to have stronger feature extraction capabilities under the same number of parameters.

h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) W^{O}

(11)

M u l t i h e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots \dots, h e a d_{h})

(12)

It is worth mentioning that in the multi head attention model of the encoding layer and the first multi head attention module of the decoding layer of the model, all $Q$ , $K$ , and $V$ come from the same place, which means that is self-attention module.

Proposed model

In previous studies, researchers used the original sequence as input, making the encoding process use sequences with high complexity and strong randomness as inputs. The ICEEMDAN algorithm is used to reduce component noise, enhance the physical meaning of components, and effectively process non-stationary data sequences. As shown in Figure 2, we will use the decomposed multivariate IMF as the input of the Encoder, with a shape of $L \times 1 \times i$ . We will use a single IMF as the input of the Decoder, with a shape of $L \times i$ . After multivariate Encoders containing multi head attention mechanisms and feedforward modules, the goal of multivariate load coupling is achieved by sharing $Q$ and $K$ with multivariate Encoder ends.

Figure 2.

Transformer prediction model and integrated energy system (IES) data shape.

The calculated multi-head self-attention is feed forward and decoded through multivariate feedforward layers and residual operations in the Decoder. Obtain the final output through the final linear layer. Finally, repeat this operation until all IMFs are predicted. Reconstruct each component to obtain the final predicted results.

We name the hybrid model proposed in this article ICEEMDAN-MIC-Transformer and use a multivariate load energy dataset as input to predict future regional short-term loads. The main steps of this model are as follows:

step 1 (Sequence decomposition)

The preprocessed multivariate energy load dataset is decomposed into sub sequences of IMFs from high frequency to low frequency using the ICEEMDAN algorithm.

step 2 (Correlation analysis)

The decomposed IMF subsequences were subjected to correlation analysis by the MIC algorithm.

step 3 (Sequence recombination)

Reconstruct sequences with high similarity, where multivariate loads are used as inputs to the encoder of the prediction model, and single loads are used as inputs to the decoder.

step 4 (Training and forecasting)

Divide the dataset into training, validation, and testing sets in proportion, and obtain the results for each component.

step 5 (Result synthesis)

Sum up all the predicted sequences to obtain the final IES load result.

Data analysis and index description

The IES load dataset comes from the campus metabolic system, which is an interactive web tool (Arizona State University, n.d.). Through this system, it is possible to view the consumption of regional level IES loads in Arizona State University (ASU) in real-time. This includes the residential load of students and faculty, the load of administrative, teaching and experimental buildings, and the load of parking buildings. These loads include electric load, cold load, and heat load. There is a central factory in ASU, which has several natural gas burning burners to supply hot water to various buildings in the Temple campus. In addition, the central factory of ASU has large coolers for cooling the water supplied to buildings. It is worth mentioning that the heat energy value in the IES load data is calculated based on the decrease in temperature and flow rate of hot water entering and exiting the building. The cold energy value is calculated based on the increase in temperature and flow rate of water entering and exiting the building.

In the process of collecting IES load data, it is inevitable to encounter problems such as data loss and anomalies, which can lead to distortion of the IES load dataset. The distorted IES load dataset will inevitably have a negative impact on the load forecasting results. Therefore, the IES load dataset must undergo data preprocessing before entering the load forecasting model, including missing value filling, outlier handling, visualization analysis, stationarity analysis, and feature construction. The Preprocessing process of multivariate load data is shown in Figure 3.

Figure 3.

Preprocessing process of multivariate load data.

Outlier handling refers to identifying and removing outliers in load data that deviate significantly from the normal range. Common methods include $Z$ -score, interquartile range and 3-Sigma principle. When data is missing and affects the continuity of load data, it is necessary to fill in the missing values, usually using linear filling or similar day filling. In addition, stationary time series can provide more effective and feasible prediction results in prediction tasks, while non-stationary time series are prone to ”False Prediction” problems. Therefore, the augmented dickey-fuller test and Kwiatkowski Phillips Schmidt Shin test methods are generally used for stationarity analysis.

Data analysis

The dataset contains three-year IES load data with hourly granularity. Therefore, the length of the dataset is $24 \times 365 \times 2 + 24 \times 366 = 26304$ , and the shape of the dataset is 26304 $\times$ 3. The IES load sequence is shown in Figure 4 and the overview of the dataset is shown in Table 1.

Figure 4.

Integrated energy system (IES) load series of Arizona State University in the United States.

Table 1.

Overview of integrated energy system (IES) dataset.

Categories	Number	Maximum	Minimum	Mean	Std
Electricity	26304	35232.21	9978.83	18360.49	4151.65
Cold	26304	17614.44	1019.66	6791.61	3817.95
Heat	26304	18.02	2.92	6.74	2.57

From the preprocessed multivariate energy load data, it can be seen that the trend of cold load is similar to that of electricity load, while the trend is exactly opposite to that of heat load. This is in line with the load characteristics of regional level IES.

In fact, most load forecasting models add relevant physical information to the dataset, such as temperature, humidity, and geographic climate, in addition to the target load sequence. However, through our testing, we found that the physical information mentioned above does not significantly improve the predictive performance of the model proposed in this paper, but it can lead to a significant increase in the computational power requirements of the model. Therefore, following the principle of Occam’s Razor, we choose not to use datasets containing physical information and instead opt for datasets containing only multivariate loads.

The dataset used in this article is a 1-hour granularity dataset, covering a three-year period from January 1, 2019 to December 31, 2021. We selected 17544 time points from January 1, 2019 to December 31, 2020 as the training and validation sets, and the entire year of 2021 as the testing set, as shown in Table 2.

Table 2.

Dataset partitioning situation.

Type	Description	Number
Training and validation dataset	From January 1, 2019 to December 31, 2020	17,544 $\times$ 3
Test dataset	From January 1st, 2021 to December 31st, 2021	8760 $\times$ 3

Load decomposition and correlation analysis

Due to the gradual progress in load forecasting, the first month’s electricity load is decomposed first, and the decomposition results are shown in Figure 5. The length of this sequence is $24 \times 30 = 720$ .

Figure 5.

Partial electric load sequence after improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) decomposition.

The ICEEMDAN algorithm adaptively decomposes the original signal into six IMFs, with frequencies decreasing sequentially. The first five IMFs have different physical meanings. The last IMF represents the overall trend, representing an upward trend in regional electricity load during the first month. Then, the cold load signal and the heat load signal are separately decomposed by the ICEEMDAN algorithm to obtain their respective IMFs. Perform correlation analysis on each component using MIC, and the resulting heatmap is shown in Figure 6.

Figure 6.

IES load heatmap under MIC correlation analysis. MIC: maximum information coefficient; IES: integrated energy system.

From the heat map, it can be seen that the correlation between the trend components of a single load is 1, which means that multivariate loads exhibit strong correlation in the trend. The correlation between IMF components with lower frequencies and their respective trend components is relatively high, generally greater than 0.8. As the frequency increases, the correlation rapidly decreases. There is almost no correlation between the highest frequency IMF components.

The selection of screening threshold for MIC is a problem worthy of in-depth research. We have consulted a large amount of literature, and generally setting the threshold of MIC between 0.5 and 0.6 is considered reasonable. In order to improve the readability of this article and the credibility of threshold settings, we changed the MIC screening threshold and used the model proposed in this article for single step prediction. We obtained the prediction results of the multivariate factor load forecasting model under different thresholds, as shown in Figure 7.

Figure 7.

Maximum information coefficient (MIC) threshold search.

From the Figure 7, it can be seen that the electric load and cold load have optimal results when the MIC threshold is 0.5, while the heat load has optimal results when the MIC threshold is 0.45 and 0.55. The predicted results of the three loads almost all decrease when the MIC threshold decreases or increases. In summary, it is reasonable to set the MIC threshold of 0.5 for the multivariate load forecasting model. Therefore, we reconstructed these IMF components into trend components, high-frequency components (with scores greater than or equal to 0.5), and low-frequency components (with components below 0.5).

To further demonstrate the applicability of ICEEMDAN in IES multivariate load forecasting, we added two types of decomposition algorithms, EEMD and CEEMDAN, and conducted two experiments on reconstruction error and modal aliasing. Due to limited space in the main text, the decomposition results of EEMD and CEEMDAN can be found in Figure 13 and Figure 14 in the Appendix.

Firstly, we reconstruct all the IMFs obtained by the three decomposition algorithms. By comparing with the original signal, it can be concluded that only the reconstructed EEMD signal has occasional errors of the order of $10^{- 12}$ with the original signal. The ICEEMDAN and CEEMDAN decomposition algorithms have almost no reconstruction error, that is, there is almost no information loss problem before and after signal decomposition.

Then, we perform modal aliasing analysis on all IMF components obtained from the three decomposition algorithms to determine whether all IMF components are in the same frequency band. The analysis results are shown in Table 3.

Table 3.

Decomposition algorithms and their IMF components with modal aliasing Analysis.

Decomposition	IMF	Obvious mode	Frequency
algorithms	component	Mixing exists	components(Hz)
EEMD	IMF1	✓	120 and 180
	IMF2	✓	56 and 60
	IMF3	✓	30 and 31
	IMF4	✓	8 and 13
	IMF5	✓	5 and 8
	IMF6	✕	3
	IMF7	✕	2
	IMF8	✕	1
CEEMDAN	IMF1	✓	90 and 30
	IMF2	✕	60
	IMF3	✕	11
	IMF4	✕	8
	IMF5	✕	5
	IMF6	✕	1
ICEEMDAN	IMF1	✕	30
	IMF2	✕	60
	IMF3	✕	11
	IMF4	✕	8
	IMF5	✕	5
	IMF6	✕	1

ICEEMDAN: improved complete ensemble empirical mode decomposition with adaptive noise; EEMD: ensemble empirical modal decomposition; IMF: intrinsic mode function.

From Table 3, it can be seen that there is a significant mode mixing phenomenon in some IMF components of EEMD and CEEMDAN. The modal aliasing phenomenon in EEMD is quite severe, with at least two frequency components present in IMF1 to IMF5. And all IMF components of ICEEMDAN detected obvious mode mixing phenomenon, that is, all IMF components are in independent frequency bands. In summary, the ICEEMDAN algorithm has a good separation effect on the original signal, can better characterize multivariate information, and has better applicability to IES multivariate load data.

To further quantify the improvement of separability by MIC grouping, we calculated the signal-to-noise ratio (SNR) after MIC grouping. Calculate the SNR by using the portion of IMF with MIC scores higher than the set threshold as the signal, and the portion of IMF with MIC scores lower than the set threshold as the noise. The formula for calculating SNR is as follows:

S N R_{dB} = 10 \cdot \log_{10} (\frac{P_{signal}}{P_{noise}})

(13)

In the formula, $P_{signal}$ is the average power of the signal, and $P_{noise}$ is the average power of the noise.

From the SNR results, it can be seen that the original signal contains a large amount of noise. For example, when reconstructing the IMF7 of heat load and the IMF6 of cold load through the MIC algorithm, the SNR values of the original electric load, cold load, and heat load are 25.41, 11.65, and 16.37 dB, respectively. When reconstructing the IMF5 of electric load through the MIC algorithm, the SNR values of the original electric load, cold load, and heat load are 23.98, 11.18, and 15.19 dB, respectively. The calculation process and results of other data are similar to the above, all below 30 dB. Obviously, the SNR values mentioned above are far below the threshold requirements of normal signals, indicating that signal filtering through the MIC algorithm is very necessary.

In addition to calculating the SNR value, we also quantified the improvement in separability of MIC grouping by comparing the MI values before and after reconstruction. According to the formula (7), the MI values between the IMF7 of heat load and the electric load, cold load, and heat load before and after reconstruction are 4.94 bits and 5.02 bits, 4.87 bits and 5.02 bits, 4.23 bits and 4.98 bits, respectively; The MI values between the IMF6 of cold load and the electric load, cold load, and heat load before and after reconstruction are divided into 3.88 bits and 6.38 bits, 6.04 bits and 6.38 bits, and 4.03 bits and 5.61 bits. The calculation process and results of other data are similar to those mentioned above. Obviously, the MI value between the reconstructed signal filtered by MIC and the predicted target signal has been improved to a certain extent, which means that the correlation between the two signals is stronger and theoretically can improve the predictive performance of the prediction model to a certain extent.

Evaluating index

To verify the predictive performance of the model proposed in this paper, mean absolute percentage error (MAPE) was used to measures the error. This evaluation index measures the relationship between predicted results and actual loads. In both horizontal comparison experiments and vertical ablation experiments, predictive models with lower MAPE achieved higher results evaluation of predictive performance. In order to present the predictive results of the model to readers in multivariate dimensions, we have also added three evaluation metrics: Root mean square error (RMSE), mean absolute error (MAE), and mean absolute scaled error (MASE).

For load forecasting values $\hat{y} = {\hat{y_{1}}, \dots, \hat{y_{n}}}$ and actual load values $y = {y_{1}, \dots, y_{n}}$ , error calculation The formulas are as follow.

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(14)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}

(15)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(16)

MASE = \frac{\frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |}{\frac{1}{n - 1} \sum_{i = 2}^{n} | y_{i} - y_{i - 1} |}

(17)

Result analysis

Configuration and forecast results

All experiments in the paper were conducted by a computer in the same configuration. In terms of hardware, the CPU is Intel Core i5-12490F 3.00 GHz, the GPU is NVIDIA GeForce RTX 4060Ti, and the memory is 16 GB. In the software part, we have chosen version 3.8 of Python. And deep learning models were selected from Python 1.1.1 and Tensorflow 2.4.1.

In order to ensure that the current hyperparameters of the studied model are optimal in the calculation example, the grid search method is used to optimize the hyperparameters. The optimization objectives, scope, and optimal parameters are shown in Table 4.

Table 4.

Model hyperparameters optimization.

Hyperparemeter	Optimization scope	Optimal value
Encoder feedforward size	[128,256,512,1024]	512
Encoder multi-head num	[4,6,8]	8
Encoder output dimension	[128,256,512,1024]	512
Encoder layer number	[4,6,8]	8
Encoder dropout	[0.05,0.1,0.15,0.2]	0.15
Decoder feedforward size	[128,256,512,1024]	512
Decoder multi-head num	[4,6,8]	8
Decoder output dimension	[128,256,512,1024]	512
Decoder layer number	[4,6,8]	8
Decoder dropout	[0.05,0.1,0.15,0.2]	0.1

According to different prediction requirements, the sliding window length and prediction step size of the prediction model should also be changed accordingly. As shown in Figure 8, for an energy load sequence with a granularity of hours, if the sliding window length is set to 96 and the prediction step size is set to 1, it means that in each iteration of the model, the 4-day data from the historical energy load sequence is used to predict the trend of energy load for the next hour.

Figure 8.

Schematic diagram of sliding prediction window.

The model training cycle is 200 epochs. If the real-time MAPE is not lower than the current optimal MAPE within 10 epochs, it will stop early.

The model and its ablation model (Transformer type deep learning model) proposed in this article are optimized using grid search algorithm, while the other deep learning models and machine learning models are optimized using particle swarm optimization algorithm. Under the model parameters shown in Table 5, use the model studied in this paper to perform load forecasting on the ASU comprehensive energy system dataset group. Use the proposed model to perform mean reconstruction on the predicted results of electricity load, cold load, and heat load, respectively. The prediction results are shown in Figure 9.

Figure 9.

The prediction results of the proposed model.

Table 5.

Optimization of hyperparameters in comparative models.

Models	Hyperparameter setting	Attribution
XGBoost	The learning rate is 0.01, the maximum depth of the tree is 6, the number of iterations is 100, the feature sampling ratio is 0.95, the L1 regularization parameter is 0.1, the L2 regularization parameter is 0.15, the splitting loss reduction threshold is 0.1, and the minimum node weight threshold is 0.1	ML
CNN	The learning rate is 0.001, the number of convolution layers is 1, the number of filters in the convolution layer is 48, the convolution kernel size is 2, the sampling step size is 1, the number of fully connected layers is 2, the discard layer parameter is 0.15, and the activation function is relu	DL
TCN	The learning rate is 0.001, the number of convolution layers is 1, the number of filters in the convolution layer is 48, the convolution kernel size is 2, the expansion coefficient is [1,2,4,8,16,32], the sampling step size is 1, the number of fully connected layers is 2, the discarded layer parameter is 0.1, and the activation function is relu	DL
GRU	The learning rate is 0.001, the number of hidden layers is 1, the number of hidden layer nodes is 128, the number of fully connected layers is 2, the discarded layer parameter is 0.1, and the activation function is tanh	DL
LSTM	The learning rate is 0.001, the number of hidden layers is 1, the number of hidden layer nodes is 128, the number of fully connected layers is 2, the discarded layer parameter is 0.1, and the activation function is tanh	DL
SVR	The kernel is RBF, and other model parameters are the default parameters	ML
S-Transformer	The model parameters are the same as the model studied in this paper	DL

CNN: convolutional neural network; GRU: gated recurrent unit; LSTM: long short-term memory.

From the Figure 9(a)(b)(c), it can be seen that when the prediction step size is 1, the reconstructed predicted value curve fits well with the true value curve, with only a small error appearing at the corner. This indicates that the model studied in this paper has high predictive performance for multivariate load forecasting tasks, and can to some extent explore the hidden regularity information in the original energy data, which has certain engineering application value.

In addition, in order to further explore the predictive performance of the studied model on real datasets, this paper also conducts normal distribution statistical analysis on the errors, as shown in Figure 9(d). It can be seen that the prediction error of electricity load is relatively concentrated, while the prediction error of cold load and heat load is relatively dispersed. However, the normal distribution of the three is expected to be around zero. Therefore, when using deep learning models for load prediction, the prediction error is expected to not deviate from the reasonable range of zero, which is convenient for IES energy suppliers to make reasonable scheduling arrangements.

In order to further analyze the prediction results, we conducted residual diagnosis on the multivariate load forecasting results, including autocorrelation test and heteroscedasticity test.

Firstly, we conducted an autocorrelation test on the results of multivariate load forecasting. Autocorrelation refers to whether there is a statistical dependency between observed values of the same variable at different time points, as shown in Figure 9(e).

From the Figure 9(e), it can be seen that the autocorrelation of heat load is the best, followed by cold load and electric load. This is related to users’ energy consumption habits, as heating and cold loads are more affected by climate factors. When the climate changes periodically with the seasons, the energy consumption of heating and cold loads also exhibits periodic changes. On the other hand, users’ electricity consumption lacks regularity and is less affected by climate factors, resulting in a smaller autoregressive coefficient.

Then, we tested the heteroscedasticity of the prediction model. Heteroscedasticity refers to the phenomenon in which the variance of random error terms in a regression model changes with changes in explanatory variables or observations. The heteroscedasticity test was conducted on the results of multivariate load forecasting using Breusch-Pagan (BP) test. Figure 9(f) shows the BP test results of electric load, with a statistical value of 0.9231 and a p-value of 0.3367. Through calculation, the BP test statistics and $P$ -values for cold load and heat load were obtained to be 0.6936 and 0.4049, 0.7153 and 0.3881, respectively, indicating that they do not satisfy heteroscedasticity and the model prediction results have stable errors.

Through multiple repeated experiments, we obtained the final experimental results and used the mean as the main evaluation metric for the predictive performance of the prediction model. In addition, the mean, maximum, minimum, quarter, half, and three-quarters values of the results of electric load, cold load, and heat load on the training and validation sets are shown in Figure 10.

Figure 10.

The comparison of the proposed model with training set and validation set.

Horizontal comparison experiment

In order to verify the superiority of the model studied in this paper compared with other algorithm models, this paper selects the depth learning model which is widely used and has better performance, as well as the classical machine learning model as the benchmark model for simulation experiments. It is worth mentioning that many literatures show that the hybrid deep learning model has better prediction performance than the single deep learning model in most cases. To sum up, the benchmark model selected in this paper includes the following two parts:

Classic machine learning models that only contain a single algorithm: LSTM, GRU, CNN, TCN, XGBoost, SVR and S-Transformer (Transformer model for single load forecasting, with single sequence input and single sequence output). Because this kind of algorithm can establish new types of models (such as stacked LSTM and Bi-LSTM models) through stacking and bidirectional forms, which will affect the prediction results, only single-layer unidirectional model is used for simulation experiments.

The hybrid deep learning model including feature extraction module: CNN-LSTM, CNN-GRU, TCN-LSTM and TCN-GRU. This kind of algorithm first uses convolution model to extract the features of the original sequence, captures the sequence information through multidimensional mapping, and finally uses deep learning model for load forecasting.

In order to verify the superiority of the model compared with the traditional machine learning model and the hybrid deep learning model which only contains a single algorithm, this paper uses the ASU IES data set to carry out a horizontal comparative experiment on the benchmark model previously studied. It is worth mentioning that, in order to ensure the reliability of the horizontal comparison experiment, this paper selects the same environmental parameter settings for the simulation experiment.

Under the hyperparameter settings shown in Table 5, the comparative experimental results of the model studied in this paper and various single benchmark models under the optimal hyperparameter and different prediction step sizes are shown in Figure 11 and Table 6. For the sake of simplicity in the article, MAPE will be used as the main evaluation metric in the future. The results using RMSE, MAE, and MASE as evaluation indicators can be found in Tables 10, 11, and 12 in the Appendix.

Figure 11.

Comparison of experimental results with single models.

Table 6.

Comparison of experimental results with single Models.

		Step length
Predicted model	Type	1	2	3	4
Proposed	Electricity	1.41	1.73	1.79	1.87
	Cold	1.24	1.47	1.84	2.07
	Heat	1.33	1.46	1.92	2.14
XGBoost	Electricity	6.27	6.89	9.39	9.15
	Cold	4.37	5.95	10.41	10.21
	Heat	5.04	5.66	8.43	8.49
GRU	Electricity	2.29	2.34	3.01	5.44
	Cold	1.66	2.87	2.52	6.17
	Heat	1.62	2.11	2.83	5.48
LSTM	Electricity	3.56	4.19	4.99	5.66
	Cold	3.39	3.93	4.56	4.92
	Heat	2.82	3.66	5.35	6.27
TCN	Electricity	4.09	4.56	8.51	8.02
	Cold	3.35	5.62	7.14	6.98
	Heat	4.81	5.35	8.29	8.75
SVR	Electricity	7.01	6.96	5.93	7.43
	Cold	5.26	5.22	5.94	8.25
	Heat	5.76	6.51	7.21	8.29
CNN	Electricity	4.46	6.71	4.23	6.07
	Cold	4.92	5.68	4.92	6.71
	Heat	4.35	5.34	4.46	9.17
S-Transformer	Electricity	2.08	1.75	3.72	4.13
	Cold	1.74	1.99	3.72	5.42
	Heat	1.74	2.32	3.45	5.23

CNN: convolutional neural network; GRU: gated recurrent unit; LSTM: long short-term memory.

For a single model, the model studied in this paper achieves the optimal results in the case of asynchronous length, and the reconstructed load curve fits the original curve most closely. In addition, many algorithmic models, such as XGBoost and CNN, grow faster in the process of increasing the step size, indicating that predictive stability of the model is poor. However, the MAPE of the model studied grows slowly in the process of step size growth, so it has high predictive stability and can meet different prediction requirements in engineering applications.

In addition, the comparative experimental results of the studied model and each hybrid benchmark model under optimal parameters and different prediction steps are shown in Figure 12 and Table 7. It is worth mentioning that, due to the large amplitude difference of electric load, cold load and heat load, the superposition of multivariate loads using the mean value will lead to the weight of electric load and cold load is too heavy, which is obviously unreasonable. Therefore, the total load selected here is the mean value of the multivariate load forecasting results, that is, all load types are in the same weight.

Figure 12.

Comparison of experimental results with hybrid models.

Figure 13.

Partial electric load sequence after ensemble empirical mode decomposition (EEMD) decomposition.

Figure 14.

Partial electric load sequence after complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) decomposition.

Table 7.

Comparison of experimental results with hybrid models.

		Step length
Predicted model	Type	1	2	3	4
Proposed	Electricity	1.41	1.73	1.79	1.87
	Cold	1.24	1.47	1.84	2.07
	Heat	1.33	1.46	1.92	2.14
CNN-GRU	Electricity	2.48	4.53	5.07	4.89
	Cold	3.12	4.06	4.68	4.44
	Heat	2.58	3.49	4.52	4.95
CNN-LSTM	Electricity	2.57	3.43	3.84	4.63
	Cold	2.77	3.04	3.25	4.76
	Heat	2.56	3.72	3.5	4.92
TCN-GRU	Electricity	1.56	1.88	2.15	2.48
	Cold	1.67	1.56	2.26	2.34
	Heat	1.54	1.75	2.09	2.86
TCN-LSTM	Electricity	1.88	2.31	3.19	5.87
	Cold	2.11	2.59	3.06	6.64
	Heat	2.04	2.15	2.41	6.75

CNN: convolutional neural network; GRU: gated recurrent unit; LSTM: long short-term memory.

It can be seen from Figure 12 and Table 7 that for the hybrid model, the model studied in this paper also achieved the optimal results at different prediction steps, indicating that compared with the traditional method of extracting the original sequence information using convolution model, this model can effectively extract and utilize the original sequence feature information using the parameter matrix with high complexity of hidden layer. Therefore, compared with the hybrid benchmark model, the model studied in this paper also has performance advantages.

In order to better represent the training cost of our model, we measured the training/inference cost of all compared models. The training/inference cost is the total duration from the start to the end of the model training, as shown in Table 8. It is worth mentioning that only the multivariate Transformer class prediction models can output results at the same time, while other prediction models have a sequence of multivariate load output times (such as the heat load being calculated first, and the electric load and cold load being calculated later). Therefore, the training/inference cost is based on the calculation time of the last type of load.

Table 8.

Model training time comparison.

	Training time
Model	Total	Electricity	Cold	Heat
Proposed	05:25:13	05:25:13	05:25:13	05:25:13
XGBoost	00:05:43	00:05:32	00:05:43	00:05:41
GRU	05:32:25	05:32:25	05:29:58	05:30:02
LSTM	05:22:58	05:20:47	05:18:25	05:22:58
TCN	04:23:33	04:23:33	04:21:30	04:21:20
SVR	00:04:03	00:04:02	00:04:03	00:03:49
CNN	04:21:49	04:20:03	04:17:28	04:21:49
S-Transformer	03:37:20	03:31:48	03:37:20	03:37:01

CNN: convolutional neural network; GRU: gated recurrent unit; LSTM: long short-term memory.

To further evaluate the significance of the performance of the model proposed in this article, we conducted a significance test using the Diebold Mariano (DM) method. The DM test evaluates significance by comparing the difference sequences of prediction errors between two models. Firstly, make assumptions about the DM test:

Null hypothesis $H_{0}$ : There is no significant difference in prediction accuracy between the two models;

Alternative hypothesis $H_{1}$ : There is a significant difference in the prediction accuracy between the two models.

By comparing the prediction sequences of the model proposed in this article with those of other models and conducting DM significance tests, all results showed $p \leq 0.05$ , rejecting the null hypothesis $H_{0}$ , indicating a significant difference in prediction accuracy between the two models. The evaluation indicators mentioned earlier can truly reflect the performance of the prediction model.

In order to ensure that data information is not leaked, we conducted a controlled experiment using a method of randomly dividing the training set and validation set for electricity load prediction. The MAPE of the validation set reached 0.82%, which is a significant improvement compared to the MAPE of 1.41% in the formal experiment, indicating a significant improvement in prediction performance. The above experiment proves that there was no data leakage in the test set of the formal experiment.

Longitudinal ablation experiment

In order to verify the superiority of the model studied in this paper compared with the naive model and ablation model, longitudinal ablation was performed on the model studied in this paper. Ablation experiment is to explore the changes of part of the model compared with the overall performance, that is, remove a part of the model each time, and check whether the prediction performance of the model is reduced, so as to test the reliability of each module in the model. The experimental model after ablation includes the following parts:

Model A

The model proposed in this paper.

Model B

The original IES data set is directly input into the model without decomposition by CEEMDAN algorithm.

Model C

After ceemdan decomposition, IMF components are not used for correlation analysis by MIC algorithm.

The results of ablation experiment is shown in Table 9. From the ablation experiment, it can be seen that Model A has the best predictive performance among the three types of loads.Model B and Model C have lower prediction accuracy to some extent than the proposed model, therefore the various sub modules of the model are feasible.

Table 9.

Results of ablation experiment.

	Type
Models	Electricity	Cold	Heat
Model A	1.41	1.24	1.33
Model B	1.55	1.32	1.59
Model C	1.82	1.66	1.91

Table 10.

Comparison of experimental results with single models (RMSE).

		Step length
Predicted model	Type	1	2	3	4
Proposed	Electricity	264.58	324.62	335.88	350.89
	Cold	232.55	275.69	345.08	388.21
	Heat	53.53	58.76	77.27	86.13
XGBoost	Electricity	1176.52	1292.86	1761.97	1716.93
	Cold	819.56	1115.87	1952.31	1914.80
	Heat	202.84	227.79	339.27	341.69
GRU	Electricity	429.70	439.08	564.81	1020.78
	Cold	311.32	538.25	472.61	1157.13
	Heat	65.20	84.92	113.90	220.55
LSTM	Electricity	668.01	786.22	936.34	1062.06
	Cold	635.77	737.04	855.19	922.71
	Heat	113.49	147.30	215.32	252.34
TCN	Electricity	767.46	855.65	1596.84	1504.90
	Cold	628.27	1053.99	1339.05	1309.04
	Heat	193.58	215.32	333.64	352.15
SVR	Electricity	1315.38	1306.00	1112.72	1394.19
	Cold	986.47	978.97	1114.00	1547.22
	Heat	231.82	262.00	290.17	333.64
CNN	Electricity	836.89	1259.08	793.73	1138.99
	Cold	922.71	1065.24	922.71	1258.41
	Heat	175.07	214.91	179.50	369.05
S-Transformer	Electricity	390.30	328.38	698.03	774.97
	Cold	326.32	373.21	697.66	1016.48
	Heat	70.03	93.37	138.85	210.49

CNN: convolutional neural network; GRU: gated recurrent unit; RMSE: root mean square error; LSTM: long short-term memory.

Table 11.

Comparison of experimental results with single models (MAE).

		Step length
Predicted model	Type	1	2	3	4
Proposed	Electricity	263.77	323.63	334.86	349.82
	Cold	231.81	274.80	343.97	386.97
	Heat	50.92	55.90	73.51	81.94
XGBoost	Electricity	1172.94	1288.92	1756.60	1711.71
	Cold	816.93	1112.30	1946.05	1908.67
	Heat	192.97	216.71	322.77	325.06
GRU	Electricity	428.39	437.75	563.09	1017.67
	Cold	310.32	536.52	471.09	1153.42
	Heat	62.03	80.79	108.35	209.82
LSTM	Electricity	665.98	783.83	933.49	1058.83
	Cold	633.73	734.68	852.45	919.75
	Heat	107.97	140.13	204.84	240.06
TCN	Electricity	765.12	853.05	1591.98	1500.31
	Cold	626.25	1050.61	1334.76	1304.85
	Heat	184.16	204.84	317.41	335.02
SVR	Electricity	1311.37	1302.02	1109.33	1389.94
	Cold	983.31	975.83	1110.43	1542.26
	Heat	220.54	249.25	276.05	317.41
CNN	Electricity	834.34	1255.25	791.31	1135.52
	Cold	919.75	1061.82	919.75	1254.37
	Heat	166.55	204.46	170.76	351.10
S-Transformer	Electricity	389.11	327.38	695.91	772.61
	Cold	325.28	372.01	695.42	1013.22
	Heat	66.62	88.83	132.09	200.24

CNN: convolutional neural network; MAE: mean absolute error; LSTM: long short-term memory.

Table 12.

Comparison of experimental results with single models (MASE).

		Step length
Predicted model	Type	1	2	3	4
Proposed	Electricity	0.565	0.693	0.717	0.749
	Cold	0.377	0.447	0.560	0.630
	Heat	0.219	0.240	0.316	0.352
XGBoost	Electricity	2.510	2.759	3.760	3.664
	Cold	1.330	1.811	3.169	3.108
	Heat	0.829	0.931	1.386	1.396
GRU	Electricity	0.917	0.937	1.205	2.178
	Cold	0.505	0.874	0.767	1.878
	Heat	0.266	0.347	0.465	0.901
LSTM	Electricity	1.425	1.678	1.998	2.266
	Cold	1.032	1.196	1.388	1.498
	Heat	0.464	0.602	0.880	1.030
TCN	Electricity	1.638	1.826	3.407	3.211
	Cold	1.020	1.711	2.173	2.125
	Heat	0.791	0.880	1.363	1.439
SVR	Electricity	2.807	2.787	2.374	2.975
	Cold	1.601	1.589	1.808	2.511
	Heat	0.947	1.070	1.185	1.363
CNN	Electricity	1.786	2.687	1.694	2.430
	Cold	1.498	1.729	1.498	2.042
	Heat	0.715	0.878	0.733	1.508
S-Transformer	Electricity	0.833	0.701	1.489	1.654
	Cold	0.530	0.606	1.132	1.650
	Heat	0.286	0.381	0.567	0.860

CNN: convolutional neural network; GRU: gated recurrent unit; MASE: mean absolute scaled error; LSTM: long short-term memory; MASE: mean absolute scaled error.

Conclusion

A short-term prediction model based on ICEEMDAN-MIC-Transformer is proposed for IES load time series with high coupling correlation. This method utilizes a Transformer model based on self attention mechanism to better capture dependency relationships in long-term data information. The ICEEMDAN decomposition algorithm was introduced into IES, which clearly captured the detailed changes in the time series and extracted internal information. In addition, introducing MIC for correlation analysis of various time series and reconstructing several IMF components after classification. The reconstructed multivariate dataset and a single dataset are used as inputs to the Transformer model for forecasting, resulting in short-term load forecasting results. The prediction results indicate that the model can accurately predict the IES load. Compared to traditional machine learning and deep learning methods, the model has better predictive performance in the horizontal comparison experiment. The longitudinal ablation experiment shows that each module of the model is feasible, and the predictive performance of the model after ablation has decreased compared to the original model. In summary, this model has certain research significance and application value for load forecasting research under IES.

Footnotes

ORCID iD

Fan Yu

Funding

The authors received the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No.KJZD-K202302602), the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No.KJQN202402604), the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No.KJQN202302604), the Science and Technology Research Program of Chongqing Electric Power College (Grant No.D-KY202516).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix proof of the Zonklar equations

References

Arizona State University (n.d.) Campus Metabolism System—Integrated Energy System. Available at: https://cm.asu.edu/ (Accessed: 23 August 2025).

Alofaysan

Radulescu

Dembińska

, et al. (2024) The effect of digitalization and green technology innovation on energy efficiency in the european union. Energy Exploration & Exploitation 42(5): 1747–1762.

Chowdary

Gope

Dawn

, et al. (2025) Economic sustainability enhancement by the integration of renewable energy in a deregulated system: A study. Energy Exploration & Exploitation 43(2): 865–905.

Deng

Zhang

, et al. (2020) Short-term electric load forecasting based on EEMD-GRU-MLR. Power System Technology 44(2): 593–602.

Gielen

Boshell

Saygin

, et al. (2019) The role of renewable energy in the global energy transformation. Energy Strategy Reviews 24: 38–50.

Lai

Chang

W-C

Yang

, et al. (2018) Modeling long-and short-term temporal patterns with deep neural networks. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018, pp. 95–104. New York: ACM.

Yuan

, et al. (2018) Medium-term forecasting of cold, electric and gas load in multi-energy system based on VAR model. In: 2018 13th IEEE conference on industrial electronics and applications (ICIEA), Wuhan, China, pp. 1676–1680. IEEE.

Lilhore

Dalal

Radulescu

, et al. (2025) Smart grid stability prediction model using two-way attention based hybrid deep learning and MPSO. Energy Exploration & Exploitation 43(1): 142–168.

Reshef

Finucane

, et al. (2011) Detecting novel associations in large data sets. Science (New York, N.Y.) 334(6062): 1518–1524.

10.

Simaiya

Dahiya

Tomar

, et al. (2024) A transfer learning-based hybrid model with lightGBM for smart grid short-term energy load prediction. Energy Exploration & Exploitation 42(5): 1853–1876.

11.

Smeers

Martin

Aguado

(2021) Co-optimization of energy and reserve with incentives to wind generation. IEEE Transactions on Power Systems 37(3): 2063–2074.

12.

Sunder

V Paul

Punia

, et al. (2024) An advanced hybrid deep learning model for accurate energy load prediction in smart building. Energy Exploration & Exploitation 42(6): 2241–2269.

13.

Tang

Liu

Xie

, et al. (2019) Short-term forecasting of electricity and gas demand in multi-energy system based on RBF-NN model. In: 2019 IEEE International conference on energy internet (ICEI), Nanjing, China, 5-9 May 2019, pp. 542–547. IEEE.

14.

Vaswani

Shazeer

Parmar

, et al. (2017) Attention is all you need. Advances in Neural Information Processing Systems 30: 5998–6008.

15.

Wang

Ding

, et al. (2022) A transformer-based method of multienergy load forecasting in integrated energy system. IEEE Transactions on Smart Grid 13(4): 2703–2714.

16.

Wang

Zhang

, et al. (2021) A multi-energy load prediction model based on deep multi-task learning and ensemble approach for regional integrated energy systems. International Journal of Electrical Power and Energy Systems 126: 106583.

17.

Zhou

Meng

Huang

, et al. (2021) Multi-energy net load forecasting for integrated local energy systems with heterogeneous prosumers. International Journal of Electrical Power and Energy Systems 126: 106542.

Short-term load forecasting of integrated energy system based on improved complete ensemble empirical mode decomposition with adaptive noise maximum information coefficient transformer

Abstract

Keywords

Introduction

Methodology

ICEEMDAN Algorithm

step 1

step 2

step 3

step 4

step 5

step 6

step 7

MIC Algorithm

step 1

step 2

Transformer model

Proposed model

step 1 (Sequence decomposition)

step 2 (Correlation analysis)

step 3 (Sequence recombination)

step 4 (Training and forecasting)

step 5 (Result synthesis)

Data analysis and index description

Data analysis

Load decomposition and correlation analysis

Evaluating index

Result analysis

Configuration and forecast results

Horizontal comparison experiment

Longitudinal ablation experiment

Model A

Model B

Model C

Conclusion

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

Appendix proof of the Zonklar equations

References