Sage Journals: Discover world-class research

Abstract

The non-intrusive appliance load monitoring (NILM) decomposes the total power consumption of a power system into its contributing appliances. Previous studies only considered using the total power consumption information of appliances to decompose the load consumption. Besides the total electricity consumption, there is also important information such as current, voltage, and time in the total electricity consumption data, which can be used to analyze the load consumption information. Therefore, we proposed a sequence-to-sequence network enhanced by an attention mechanism, which effectively integrated the external features besides the total electricity consumption in grid data. Finally, we applied and evaluated the proposed model on the electricity consumption data of a gas station with 12 appliances, and our model achieved a 90.5% accuracy in load decomposition. Our solution provides a new solution on the application of NILM in the industrial field and helps to manage energy more rationally.

Keywords

Industrial electricity attention mechanism deep learning combined encoding non-intrusive load monitoring

Introduction

With the development of science and technology, the intelligent power systems covered a wide range of applications, including power monitoring systems and energy management systems.^1,2 Industrial electricity consumption accounts for a large share of global electricity consumption.³ Understanding the electricity consumption of industrial infrastructure can help reduce and optimize the power consumption of industrial infrastructure.⁴ Non-intrusive appliance load monitoring (NILM) is an important research direction in intelligent power systems that have become the focus of researchers in recent years.^5,6 NILM breaks down the total power consumption of a power system into the contribution of a single device. The NILM methods can further obtain electrical information, which can help reduce power consumption costs and enable a more rational allocation of power resources.^7–9 Although most NILM research focuses on household energy use due to the difficulty of obtaining data, applying NILM to the industry has gradually gained wide attention in recent years.^10–14

In recent years, there is growing use of computer-aided models (e.g. signal processing, machine learning methods, and deep neural networks) in various fields such as medicine, economics, and industry,¹⁵ which have also been used to solve NILM problems.^16–18 A method of continuous wavelet transform (CWT) was proposed by previous studies to explore the electrical characteristics of the transients of NILM switching voltage and was compared with the previously used short-time Fourier transform (STFT).¹⁹ The researchers designed an algorithm combining sliding windows and neural networks for high-frequency household electricity data load decomposition and achieved excellent performance.¹⁶ Since the collection, transmission, and storage costs of high-frequency power data are high, there are limitations in real practice. The decomposition of low-frequency power data is more beneficial in the grid. There have been many studies using machine learning methods for NILM of low-frequency data.^20,21 The factorial hidden Markov model (FHMM) of the load was established to transform the NILM problem into the optimal combination states problem.²² The application of deep learning is a research trend recently, and many models have been proposed such as deep neural networks (DNN), gated recurrent unit (GRU), long short-term memory (LSTM), and co-attention network.^23–27 Rafiq compared the structure of LSTM and GRU in detail, and discussed the effect of hyperparameter and regularization on the performance of load decomposition.²³ A recent study proposed a multi-feature combination multi-layer LSTM model for generating the modified input data for improving the classification performance, which was superior to the existing methods based on the GRU or a single-layer LSTM.²⁴ To overcome the problem of ignoring inter-device dependence in the existing single-task learning method of DNN model, some researchers proposed UNet-NILM for state detection and power estimation of multi-task devices, applying multi-label learning strategy and multi-objective quantile regression.²⁵

However, most previous deep learning-related studies only use the total power as input. Meanwhile, the attention mechanism has been successfully proposed in the field of multi-modal feature fusion such as image and text, which makes it possible to introduce external features based on the attention mechanism to increase the performance of NILM.

To integrate the external features in the model and fill the gap of NILM in industrial and commercial applications, this study proposed an attention mechanism-based NILM model. Since the industrial electricity sampling frequency is low, the electrical data volume is large, and the electrical state is complex, the K-means algorithm was used to cluster the two-dimensional load features in a steady state and then a combined code was obtained to reduce the dimensionality of the electrical state. Subsequently, the proposed model performed a load decomposition on the multiple features of the input. Finally, the proposed model achieved a 90.5% accuracy in load decomposition on gas station electricity consumption data.

Methods

The NILM (equation (1)) aims to obtain the power of each appliance from the decomposition of the total energy consumption information.

[Y_{t}^{1}, Y_{t}^{2}, . . ., Y_{t}^{N}] = f (X_{t})

(1)

Where $Y_{t}^{i}$ is the power of the i-th appliance at time t and $N$ is the number of appliances, $f$ is the NILM function and $X_{t}$ is the total power.

Combined encoding

There are four categories of electrical loads: switched, continuous use, continuously variable, and limited state.²⁸ The NILM can be regarded as the appliance operating state combination problem, because each appliance’s operating state (all have rated power) is finite.

[S_{t}^{1}, S_{t}^{2}, \dots, S_{t}^{N}] = f (X_{t})

(2)

where S is the working status.

To get actual load representation for appliances in each state, we set the initial number of clusters of the K-means algorithm, which has a wide range of applications in the field of NILM,^29,30 to the number of operating states of the appliance, and then cluster the active power and reactive power.

Then, we define the state of the M appliance at a certain moment as y:

y = [S^{1}, S^{2}, \dots, S^{M}], S^{i} \in {0, 1, \dots, n_{i} - 1}

(3)

where $S^{i}$ is the i-th appliance’s working status and $n_{i}$ is the i-th appliance’s working status number.

Then the NILM problem can be further transformed into equation (4).

y = f (X), y \in {0, 1, \dots, N - 1}

(4)

Finally, the problem is transformed into predicting the state codes of the appliances using the total electricity consumption information, and then corresponding the state codes to the actual power to achieve load decomposition.

Baseline model: LSTM-based seq2seq network

The LSTM (Long-short-term memory, Figure 1) includes input layer, hidden layer, and output layer. We represent the input of a time series as x and the output as h. At each time step, each LSTM cell updates six parameters. The LSTM can alleviate the gradient disappearance and explosion problems and is often used to learn long-term information about a sequence.³¹

Figure 1.

LSTM’s internal topology.

Attention model

Based on the seq2seq model, we proposed an attention model to fuse voltage, current, and time information. Figure 2 shows the overall structure of our model. The original seq2seq model encodes the total electricity information into the hm vector through the coding unit, and then analyzes the electricity consumption state coding of each appliance by decoding the hm vector. In this paper, we effectively fused the current, voltage, and time information with the coding vector through the attention module to generate a new coding vector M, and then completed the load analysis of different electrical equipment by decoding the M vector. The attention model acts as an internal unit to find intermediate representation features based on the combination of features from different sources. We obtained an external feature based on external information and a coded feature based on coded information, and then we computed a multimodal gate to fuse them because different data depend on external information differently. The definition of the Attention model is as follows:

z = \tanh (w_{q} q \oplus (w_{x} x + b_{x}))

(5)

α = \tanh (w_{α} z + b_{z})

(6)

v = \tanh (w_{h_{m}} h_{m} + b_{h_{m}})

(7)

g = σ (w_{g} (α \oplus v))

(8)

m = g α + (1 - g) v

(9)

{\hat{h}}_{m} = h_{m} \oplus m

(10)

Figure 2.

Attention mechanism enhanced sequence-to-sequence network structure diagram. The original seq2seq model encodes the total electricity information into the hm vector through encoding unit, and then analyzes the electricity consumption state coding of each appliance by decoding the hm vector. The attention model acts as an internal unit to find intermediate representation features based on the combination of features from different sources.

where $w_{q}$ , $w_{α}$ , $w_{h_{m}}$ , $w_{g}$ , $b_{x}$ , $b_{z}$ , $b_{h_{m}}$ are parameters, q is the concatenation vector of current, voltage and time information, X is the total electricity consumption information, and HM is the coding feature of the coding model production. ⊕ is the concatenating operation, σ is the logistic sigmoid activation, and g is the gate applied to the new power consumption information. $α$ and m are the multimodal fusion features between the new power consumption information feature and new encoded information feature.

Model design

In this study, we proposed an attention mechanism enhanced sequence-to-sequence NILM model. The LSTM unit is a specialized neural network structure that is commonly used for processing sequential data because it can capture dependencies over time. In our model, the output of each LSTM unit becomes the input for the next time step, so each input data point needs to be matched with an LSTM unit.

Since our data was collected every 15 min, there were 96 time points in a day. We used 1 day’s worth of data as a batch for the model’s input and output. Therefore, to ensure that the LSTM unit can capture information from each time point, we chose to have 96 LSTM units per LSTM layer. This enables each LSTM unit to model the information at each time point and pass it as output to the next LSTM unit.

Thus, in this model, the number of neurons in the LSTM unit should indeed be equal to the number of time points in the input and output data. This ensures that information from each time point is effectively passed and captured, allowing the model to better learn and predict the data.

The stochastic Gradient Descent (SGD) optimizer was used to train our model,³² with learning rate (LR) of 0.001 and momentum of 0.99, and we set the number of epochs to 50.³³ For detailed process of determining model parameters, please see Section Results and Discussion.

We divided the working states according to the working mode of appliances, and obtained the representative electrical characteristics of appliances in different states by clustering the active (P) and reactive power (Q) of the appliances with the K-means algorithm. Then, we combined the states of all appliances at a certain moment into a new integrated state, and at the same time, we extracted the electricity, voltage, power, current, electricity, power factor, and time information from the total electricity consumption data. The electrical data collected from each appliance sub-meter and the total electricity consumption data were used to build the model, and we randomly divided the total data into training set, validation set and test set according to 8:1:1 for model training, parameter tuning, and verification of model performance, respectively.

We train our proposed model with extracted electrical features and integrated state codes. When the model is trained, the integrated state codes are obtained by inputting appliance features and thus indexing the power of the appliance to complete the load decomposition.

Experiments and results

Data acquisition

We collected electricity and power data for 191 days from December 2020 to August 2021 at a gas station in Jinchang, Gansu Province. To avoid safety hazards, the Internet of Things (IoT) devices were not installed in the gas station operation area, but on switches in the main distribution room. We installed the acquisition devices and sensors at the end of the appliances and the distribution room, respectively. In practice, many devices are connected to the same assembly line, and multiple devices could be controlled by switches in the distribution room as needed. Electrical data collected by sensors match real-world reality due to the largely uniform deployment of switch-controlled electrical equipment.

We use snap-on current transformers to connect the acquisition device to the energized conductor (Figure 3(a)). The collection devices collected electricity consumption data (one for 15 min) for all appliances (Figure 3(b)).

Figure 3.

Data acquisition equipment: (a) snap-on current transformers and (b) AC branch power detector.

We measured several measurements of the 12 appliances of the gas station, such as active power and reactive power. All these appliances are gas station facilities, including Submersible Pump (A1), Central Air Conditioning (A2), Canopy Lights Strip (A3), Uninterrupted Power Supply (UPS, A4), Kitchen Socket (A5), Integrated Office Socket (A6), Lounge Socket (A7), Outdoor Advertising SignageCounter socket (A8), Counter Socket (A9), Convenience Store Socket (A10), Freezer (A11), and Canopy Lights (A12). Convenience Store Socket, Counter socket, Kitchen Socket, Lounge Socket, and Integrated Office Socket combine the electrical characteristics of small appliances throughout the room. UPS provides 24 h uninterruptible power supply for devices that require high power supply stability and is in standby mode most of the time. Freezers, Canopy Lights, Outdoor Advertising Signage, Canopy Lights Strip, and Central Air Conditioning are high-powered appliances that have multiple operating models and are time-sensitive. The measured values of the Submersible Pump denote the sum of the electrical characteristics of all the refueling units. Since fuel dispensers operate 24 h a day, it is difficult to determine the complete duty cycle.

We obtained the power factor, power, current, voltage, electric quantity, and electricity data from the power company’s data center. Figure 4 compares the total power consumption data with the sensor data (18consecutive days were randomly selected), and the trends and magnitudes of the two curves are generally consistent, which indicates that the sensor data are reliable.

Figure 4.

Comparison of collected data and real power data.

Performance evaluation

The 191 days of data was randomly divided into training set, validation set, and test set by 80%, 10%, 10%. The performance of our model was evaluated on the test set using two categories of metrics^34,35: the first category includes accuracy, recall, precision, and F1 score, and the second category consists of mean absolute error (MAE), and Root Mean Squared Error (RMSE). The first category compares the difference between the predicted and actual states, and the second category compares the decomposed power signal with the real (equations (11) and (12)). We used several methods for comparison, including K-nearest neighbor, support vector machine, and random forest. To further verify that our proposed fusion mechanism works, we also compared the soft attention mechanism, the hard attention mechanism, and the baseline model.

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - Y_{i} |

(11)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - Y_{i})}^{2}}

(12)

where $y_{i}$ and $Y_{i}$ stands for the decomposed power signal and the real, $n$ denotes the number of samples. A smaller MAE and RMSE indicate smaller prediction errors.

Results and discussion

Figure 5 shows the operating states and the corresponding active/reactive power of each appliance that we obtained by K-means clustering.

Figure 5.

The working state and power value of the appliance (S1 means status 1 and so on).

Figure 6 compares the power values of all appliances before and after coding. The figure shows that the yellow curve of index power values fits the blue curve of the actual power better, indicating that the state of the appliances we are defining is more accurate. Figure 7 shows the performance of state prediction, where the index power value (P) is the actual power for each operating state.

Figure 6.

Comparison of P and actual power of all appliances.

Figure 7.

The performance of state prediction.

Figure 8 shows the mean accuracy, MAE, and RMSE of our proposed model and other comparable models. Our model achieves an accuracy of 90.5% and an MAE of only 16.15, which is better than the other models. The results show that the proposed method is able to enhance the capability of the baseline model by introducing electrical features (such as power) based on the attention mechanism. Besides, Figure 9(a) shows the MAE, RMSE, and accuracy performance of our model on three datasets: training, validation, and testing. As shown in the table, the final test performance of the model is with an accuracy of 0.905, and the MAE is equal to 16.15. Figure 9(b) shows the trend of loss changes for models with different epochs. For the convenience of differentiation, we use blue dots to represent the losses during the training process and red connecting lines to represent the losses during the validation process.

Figure 8.

Model performance.

Figure 9.

The performance and loss values of different datasets: (a) performance of the proposed model on different datasets and (b) the loss values of the train and validation datasets at different epochs.

Notably, to determine the parameters of the model, we compared the performance of the model when using different optimizers (e.g. SGD, Adam, and RMSprop³⁶), and we also compared the performance of the model at different learning rates (Figure 10). It can be seen from the results that when the optimizer is SGD, the MAE and RMSE values of the model are the lowest and the accuracy is the highest, while the model performance is the best when the learning rate parameter is 0.001 with the SGD optimizer.

Figure 10.

The process of determining model parameters: (a) impacts of optimizers on model performance in the test dataset and (b) impacts of learning rate on model performance in the test dataset, with optimizer = SGD (LR = learning rate).

The predicted total power of our model is very similar to the actual total power, which indicates that our model is very reliable (Figure 11).

Figure 11.

The comparison of predicted power data and real data.

According to the algorithm proposed in this study, the load decomposition can be calculated to get the proportion of energy consumption of each electrical appliance, as well as the corresponding cost. By analyzing the energy consumption proportion of all electrical appliances, we found that the most energy-consuming one is the Submersible Pump (Figure 12). Canopy Lights Strip, Kitchen Socket, Integrated Office Socket, and Outdoor Advertising Signage account for nearly half of the total energy consumption. The other appliances consume a smaller percentage of energy due to their low working power or less frequent use. These results are helpful to reduce the cost of power consumption and optimize the allocation of power resources.

Figure 12.

The proportion of decomposition power data of each electrical appliance.

Conclusions

In this study, we proposed an attention-based model that integrates multiple electrical features to enhance the sequence-to-sequence network for NILM. By training the model using total energy consumption signal and combined status code, our model achieves 90.5% accuracy in load decomposition on gas station electricity consumption data. The decomposition results show that our enhancement strategy works and that our model outperforms traditional methods. Our results demonstrated that the most energy-consuming appliance is the Submersible Pump, while Canopy Lights Strip, Kitchen Socket, Integrated Office Socket, and Outdoor Advertising Signage accounts for nearly half of the total energy consumption. These results are helpful to reduce the cost of power consumption and optimize the allocation of power resources. In the future, our work should design new architectures to accelerate the training and decomposition process of the model and improve the prediction quality.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Rong Yang

References

K-D

Z-G

. Resilient event-triggered load frequency control for cyber-physical power systems under DoS attacks. IEEE Trans Power Syst 2022; 1–11.

K-D

Z-G

Huang

. Differential evolution-based three stage dynamic cyber-attack of cyber-physical power systems. IEEE/ASME Trans Mechatron 2023; 28: 1137–1148.

Holmegaard

Kjærgaard

. Towards NILM for industrial settings. In: Proceedings of the 2015 ACM sixth international conference on future energy systems, 2015.

Henning

Hasselbring

Möbius

. A scalable architecture for power consumption monitoring in industrial production environments. In: 2019 IEEE international conference on fog computing (ICFC), 2019. New York: IEEE.

Dong

Chung

. Non-intrusive signature extraction for major residential loads. IEEE Trans Smart Grid 2013; 4(3): 1421–1430.

Athanasiadis

Papadopoulos

Doukas

. Real-time non-intrusive load monitoring: a light-weight and scalable approach. Energy Build 2021; 253: 111523.

Jiang

Yang

, et al. Research of NILM in offshore oil platform power system. In: 2020 IEEE international conference on power, intelligent computing and Systems (ICPICS), 2020. IEEE.

Yang

Liu

. Sequence to point learning based on an attention neural network for nonintrusive load decomposition. Electronics 2021; 10(14): 1657.

Tabatabaei

Dick

. Toward non-intrusive load monitoring via multi-label classification. IEEE Trans Smart Grid 2017; 8(1): 26–40.

10.

Filip

. Blued: a fully labeled public dataset for event-based nonintrusive load monitoring research. In: 2nd workshop on data mining applications in sustainability (SustKDD), 2011.

11.

Kolter

Johnson

. REDD: a public data set for energy disaggregation research. In: Workshop on data mining applications in sustainability (SIGKDD), San Diego, CA, 2011.

12.

Yue

Witzig

Jorde

, et al. Bert4nilm: A bidirectional transformer model for non-intrusive load monitoring. In: Proceedings of the 5th International Workshop on Non-Intrusive Load Monitoring, 2020.

13.

Shi

, et al. Nonintrusive load monitoring based on complementary features of spurious emissions. Electronics 2019; 8(9): 1002.

14.

Meier

Cautley

. Practical limits to the use of non-intrusive load monitoring in commercial buildings. Energy Build 2021; 251: 111308.

15.

Chiniforooshan Esfahani

. A data-driven physics-informed neural network for predicting the viscosity of nanofluids. AIP Adv 2023; 13(2): 025206.

16.

Zhang

Zhong

Wang

, et al. Sequence-to-point learning with neural networks for non-intrusive load monitoring. In: Proceedings of the AAAI conference on artificial intelligence, 2018.

17.

Singh

Majumdar

. Deep sparse coding for non–intrusive load monitoring. IEEE Trans Smart Grid 2018; 9(5): 4669–4678.

18.

Lin

Y-H

Tsai

M-S

. Non-intrusive load monitoring by novel neuro-fuzzy classification considering uncertainties. IEEE Trans Smart Grid 2014; 5(5): 2376–2384.

19.

Duarte

Delmar

Goossen

, et al. Non-intrusive load monitoring based on switching voltage transients and wavelet transforms. In: 2012 future of instrumentation international workshop (FIIW) proceedings, 2012. New York: IEEE.

20.

Sreevidhya

Kumar

Ilango . Design and implementation of non-intrusive load monitoring using machine learning algorithm for appliance monitoring. In: 2019 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS), 2019. New York: IEEE.

21.

Gong

Han

Zhou

, et al. A SVM optimized by particle swarm optimization approach to load disaggregation in non-intrusive load monitoring in smart homes. In: 2019 IEEE 3rd Conference on energy internet and energy system integration (EI2), 2019. New York: IEEE.

22.

Bonfigli

Principi

Fagiani

, et al. Non-intrusive load monitoring by using active and reactive power in additive factorial hidden Markov models. Appl Energy 2017; 208: 1590–1607.

23.

Rafiq

Zhang

, et al. Regularized LSTM based deep learning model: first step towards real-time non-intrusive load monitoring. In: 2018 IEEE international conference on smart energy grid engineering (SEGE), 2018. New York: IEEE.

24.

Kim

J-G

Lee

. Appliance classification by power signal analysis based on multi-feature combination multi-layer LSTM. Energies 2019; 12(14): 2804.

25.

Faustine

Pereira

Bousbiat

, et al. UNet-NILM: a deep neural network for multi-tasks appliances state detection and power estimation in NILM. In: Proceedings of the 5th international workshop on non-intrusive load monitoring, 2020.

26.

Murray

Stankovic

, et al. Transferability of neural network approaches for low-rate energy disaggregation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. New York: IEEE.

27.

Xia

Liu

, et al. Dilated residual attention network for load disaggregation. Neural Comput Appl 2019; 31(12): 8931–8953.

28.

Ridi

Gisler

Hennebert

. A survey on intrusive load monitoring for appliance recognition. In: 2014 22nd international conference on pattern recognition, 2014. New York: IEEE.

29.

Khan

MMR

Siddique

MAB

Sakib

. Non-intrusive electrical appliances monitoring and classification using K-nearest neighbors. In: 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), 2019. New York: IEEE.

30.

Basu

Debusschere

Douzal-Chouakria

, et al. Time series distance-based methods for non-intrusive load monitoring in residential buildings. Energy Build 2015; 96: 109–117.

31.

Tasdelen

Sen

. A hybrid CNN-LSTM model for pre-miRNA classification. Sci Rep 2021; 11(1): 14125–14129.

32.

Zhang

. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning, 2004.

33.

Kabir

Siddique

Kotwal

MRA

, et al. Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: 2015 international conference on cognitive computing and information processing (CCIP), 2015. New York: IEEE.

34.

Salerno

Rabbeni

. An extreme learning machine approach to effective energy disaggregation. Electronics 2018; 7(10): 235.

35.

Athanasiadis

Doukas

Papadopoulos

, et al. A scalable real-time non-intrusive load monitoring system for the estimation of household appliance power consumption. Energies 2021; 14(3): 767.

36.

Zou

Shen

Jie

, et al. A sufficient condition for convergences of Adam and RMSprop. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019.

Application of attention mechanism enhanced neural network in non-invasive load monitoring of industrial power data

Abstract

Keywords

Introduction

Methods

Combined encoding

Baseline model: LSTM-based seq2seq network

Attention model

Model design

Experiments and results

Data acquisition

Performance evaluation

Results and discussion

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References