Sage Journals: Discover world-class research

Abstract

Industrial processes are becoming increasingly complex, requiring advanced modelling techniques to understand their behaviour and improve their performance. In this context, deep learning algorithms have proven to be effective tools for modelling dynamic systems, with Recurrent Neural Networks (RNNs) being particularly suitable for time-series data. However, the computational complexity of deep learning models can be a limitation in industrial environments, where real-time responses are required.

This work proposes the use of Deep Echo State Networks to model an industrial system. The aim is to evaluate its performance in real-time industrial applications when running on embedded devices. The approach is validated on a process composed of four interconnected water tanks, which exhibits typical nonlinear industrial dynamics. Among several candidate architectures (including vanilla RNNs or LSTMs), Deep ESNs were selected for their balance of accuracy and computational efficiency. Different input-output setups and number of Deep ESN layers are tested, and results are compared with LSTMs in terms of accuracy and execution time. Finally, the best Deep ESN models are implemented on industrial embedded devices to evaluate the possibility of running these models in real time.

The proposed approach achieved up to a 33% reduction in RMSE and a 14% improvement in $R^{2}$ compared to traditional reservoir computing, highlighting its superior predictive performance. The results show that Deep ESN models can effectively model the industrial system, with the best configurations achieving high accuracy and low execution times, demonstrating the feasibility of running these models in real time in industrial environments.

Keywords

reservoir computing deep echo state networks dynamics data-based modelling embedded devices

1. Introduction

Industrial processes tend to become increasingly advanced and complex, and this complexity implies greater difficulties in their management and monitoring. For this reason, modelling industrial systems is a task of great relevance to better understand its characteristic behaviour and how it evolves over time. In addition, having a model of the system also facilitates the detection of failures or scenarios that deviate from normal behaviour, allowing operators to take action to correct these problems. Eventually, this modelling leads to more detailed analyses that allow the process to be optimised, leading to improved performance and greater efficiency, thereby reducing energy consumption and increasing production.¹

However, the modelling and representation of dynamical systems in an industrial environment presents certain difficulties: it is common to have a large amount of data extracted from processes that change in a matter of milliseconds, but extracting knowledge about the process behaviour from this data is not as easy as it may seem.² For this reason, it is important to obtain a model that is as close as possible to the reference industrial process, which allows the dynamics of the system to be represented as faithfully as possible.

In this respect, although there are different approaches to develop such models of physical systems, machine learning tools are one of the most widely used. Among the various machine learning algorithms available, Recurrent Neural Networks (RNNs) have proven to be particularly effective in modelling dynamic systems, especially those that involve temporal sequences.³ RNNs distinguish themselves from other neural networks by their unique architecture, which includes recurrent connections to capture past dependencies across time steps. This architectural feature allows RNNs to model sequential data with high-dimensional hidden states, while efficiently handling non-linear and complex dynamical behaviours that arise in time-series data.⁴ Recent applications demonstrate the successful use of spatio-temporal deep learning networks in dynamic prediction tasks, such as lightning nowcasting,⁵ which reinforce the relevance of such models for real-world sequence modelling under uncertainty.

Within machine learning, deep learning has emerged in recent years as a very powerful tool for modelling complex systems, mainly due to its ability to capture non-linear patterns and relationships in data.⁶ Deep learning is based on the use of multi-layer neural networks, which allow features to be extracted from the input data at different levels of abstraction. This has enabled its application in dynamic system modelling under uncertain or complex conditions, such as vehicle dynamics in snowy environments,⁷ real-time fault diagnosis in industrial systems,⁸ and efficiency improvements in automated manufacturing.¹ Furthermore, comparative studies have shown the effectiveness of deep learning techniques in high-dimensional, time-sensitive domains like media content classification,⁹ cardiac condition detection,¹⁰ and energy monitoring,¹¹ reinforcing its generalisability in safety-critical or resource-constrained scenarios.

However, the use of increasingly complex architectures also implies greater computational complexity, requiring more powerful hardware platforms and longer training times.¹² Despite this, with improvements in workload distribution and the use of Graphics Processing Units (GPUs), the training and execution times of deep learning models have been significantly reduced, allowing their deployment in real-time applications across a variety of industrial domains.¹³

Nevertheless, the use of such deep learning models in industrial environments can be challenging, especially where computational or hardware resources are limited, and where real-time responses are required to promptly detect anomalies and take action. This has motivated research into lightweight and efficient modelling techniques capable of running on embedded platforms while maintaining predictive performance.^1,8

A particularly interesting and valued algorithm in this respect is Long Short-Term Memory (LSTM), a type of recurrent neural network that has proven effective in modelling dynamic systems, particularly in time series prediction.¹⁴ However, despite their modelling capabilities, LSTMs often suffer from slow convergence and training instability.

As an alternative, Reservoir Computing (RC) has gained attention due to its much faster and more efficient training with lower hardware requirements. This is accomplished by generating a hidden layer of recurrent neurons, called the reservoir, that is randomly initialised and held fixed during training, and a linear output layer that is trained under supervision.¹⁵ Approaches such as adaptive reservoir computing have already demonstrated success in manufacturing environments,¹⁶ reinforcing the suitability of this paradigm in industrial contexts.

Among RC models, the Echo State Network (ESN) stands out for its simplicity and effectiveness in modelling complex systems. Moreover, ESNs allow for model adaptation by retraining only the output weights, avoiding full retraining of the model.¹⁷ More specifically, with the introduction of Deep Echo State Networks (Deep ESNs),¹⁸ reservoir computing has merged with deep learning paradigms to build hierarchical architectures of multiple reservoir layers. These models outperform standard ESNs by capturing temporal dependencies at different time scales,¹⁹ and have shown promising performance. Their architecture allows them to efficiently model multi-scale industrial dynamics while avoiding the high training cost of backpropagation-based models.

The lightness, efficiency, and adaptability of standard ESNs and Deep ESNs make them particularly attractive for industrial system modelling, especially in scenarios requiring embedded real-time implementation. Although some studies have explored the application of ESNs in industrial contexts, many focus on simulated or lab-scale systems,²⁰ and there remains a gap in evaluating their performance on real-world industrial setups. This work aims to help address this gap and contribute to ongoing efforts in the community to develop scalable, low-complexity solutions for industrial modelling.

For these reasons, the application of Deep ESN to the modelling of real industrial systems, and especially the possibility of running these models on low-cost embedded devices, is an interesting and promising field of study that can be useful for improving these industrial processes.

In this context, this work proposes the application of Deep ESNs to model a real industrial process composed of four interconnected tanks, while assessing the feasibility of deploying these models on embedded platforms. Building on previous work,²¹ this study extends the evaluation to the complete system, systematically comparing input/output configurations and Deep ESN architectures to determine the optimal setup for accurate and efficient real-time modelling.

Furthermore, a performance comparison of the Deep ESN and LSTM models is performed in terms of accuracy and execution time with the selected best configurations, to evaluate the efficiency of the proposed approach. LSTM models are selected for comparison because they are among the most widely used RNN-based methods for time series modelling, and have shown strong performance in various dynamic system applications. Additionally, previous studies have compared ESNs and LSTMs, highlighting their differences in terms of computational efficiency and prediction accuracy.²²

Finally, the selected Deep ESN models are implemented on low-cost industrial embedded devices to evaluate the possibility of running these models in real time.

This work is structured as follows: Section 2 provides an overview of Deep Reservoir Computing techniques, focusing on Deep Echo State Networks. Section 3 outlines the methodology to implement the Deep Reservoir Computing models, including the selection of the best architectural setups, the comparison with LSTMs and the testing on embedded devices. Section 4 describes the quadruple-tank industrial plant used in the experiments, detailing the different operating scenarios considered. Section 5 presents the results obtained from the experiments conducted with the proposed methodology. Finally, Section 6 summarises the main conclusions of the work and outlines future research directions.

2. Deep reservoir computing

Reservoir Computing (RC) is a machine learning paradigm that has gained popularity in recent years due to its ability to model complex dynamical systems. RC is based on the concept of a reservoir, a high-dimensional dynamical system that processes input signals and generates a rich representation of the input data. The reservoir is a randomly generated network of interconnected nodes, which are typically sparsely connected and have fixed weights. The input data is fed into the reservoir, and the reservoir’s dynamics transform the input data into a high-dimensional representation, which is then used to predict the output of the system. This projection maps temporal patterns into a geometric space where they become more linearly separable. The output is then generated by a linear readout layer that maps the reservoir’s high-dimensional representation to the desired output. The key advantage of RC is that the reservoir’s dynamics are fixed and do not need to be trained, which simplifies the training process and makes these models more robust to noise and perturbations.

Echo State Networks (ESN),²³ constitute a particular implementation of Reservoir Computing. ESNs rely on a fixed, randomly initialised reservoir and a trainable readout layer. For this reason, ESNs offer an efficient and effective alternative to traditional RNNs,¹⁸ as they are easier to train and require fewer parameters. The reservoir acts as a non-linear, high-dimensional dynamical system, which maps an input vector $u (k) \in R^{N_{U}}$ of dimension $N_{U}$ into a state vector $x (k) \in R^{N_{R}}$ with $N_{R}$ reservoir states.

The most common variation of this architecture incorporates a leaky integrator mechanism to modulate the recall of previous states.²⁴ This variant, known as Leaky Integrator ESN (LI-ESN), is the one used in this work and it responds to the following state equation:

\begin{aligned} x (k) & = (1 - α) x (k - 1) \\ + α ϕ (W_{res} x (k - 1) + W_{in} u (k) + θ) \end{aligned}

(1)

In this equation, $x (k) \in R^{N_{R}}$ represents the reservoir state at time $k$ , while $u (k) \in R^{N_{U}}$ denotes the input vector. The weights of the reservoir connections are defined by $W res \in R^{N_{R} \times N_{R}}$ , and the input-to-reservoir weights are given by $W in \in R^{N_{R} \times N_{U}}$ . The vector $θ \in R^{N_{R}}$ introduces a bias to the input. The leak rate $α \in [0, 1]$ controls the influence of previous states on the current reservoir state, while $ϕ$ represents the activation function applied element-wise, typically a hyperbolic tangent or sigmoid function.

The readout mechanism translates the reservoir states into an $N_{Y}$ -dimensional output vector $y (k) \in R^{N_{Y}}$ using a linear transformation. The weights of this transformation are the only parameters trained in the ESN, typically optimized via standard least-squares regression. The output equation is given by:

y (k) = W_{out} x (k) + θ_{out},

(2)

where

y (k) \in R^{N_{Y}}

represents the output vector at time

k

W_{out} \in R^{N_{Y} \times N_{R}}

denotes the reservoir-to-readout weight matrix, and

θ_{out} \in R^{N_{Y}}

is the bias vector applied to the output.

Figure 1.

Structure of a shallow ESN.

The Shallow ESN architecture is depicted in Figure 1, where bias terms have been omitted for clarity. This structure is composed of three main layers: the input layer, the reservoir layer, and the readout or output layer.

The input layer, parameterised by the weight matrix $W_{in}$ , connects the $N_{U}$ inputs from the input vector $u (k)$ to the reservoir. The reservoir layer consists of a large number, $N_{R}$ , of internal units (or nodes), with activations $x_{1} (k), \dots, x_{N_{R}} (k)$ forming the reservoir state vector $x (k)$ at time $k$ . These nodes are interconnected via the reservoir weight matrix $W_{res}$ , allowing for internal feedback and recurrent dynamics. The readout layer transforms the $N_{R}$ reservoir states in $x (k)$ into $N_{Y}$ outputs in $y (k)$ , using the weights specified in $W_{out}$ .

The weight matrices $W_{res}$ and $W_{in}$ are initialized with random values and specific adjustments are applied to foster desirable dynamic behaviour in the reservoir. To achieve sparsity, most elements in these matrices are set to zero, leaving only a small fraction of non-zero weights distributed randomly. This sparsity strategy can be interpreted as a design choice to reduce memory and computational cost while preserving the dynamical richness of the reservoir. The reservoir weight matrix $W_{res}$ is then scaled such that its largest eigenvalue, or spectral radius $ρ_{max}$ , is approximately equal to $1$ (or slightly greater in some cases). This scaling ensures that the reservoir satisfy the so-called Echo State Property (ESP),^25,26 maintaining its dynamics on the edge of stability. Similarly, the input weight matrix $W_{in}$ is scaled by a predefined factor $i s s$ , with its values confined to the range $[- i s s, i s s]$ .

When the reservoir processes an input sequence ${u (k)}_{k = 1}^{N_{k}}$ iteratively, it generates a sequence of high-dimensional reservoir state vectors ${x (k)}_{k = 1}^{N_{k}}$ . These state vectors are then combined linearly in the readout layer using trainable weights to produce the output sequence ${y (k)}_{k = 1}^{N_{k}}$ . Representing these sequences in matrix form, the relationship can be expressed as $Y = W_{out} X$ , where $Y$ and $X$ are the matrices of outputs and reservoir states, respectively. The readout weight matrix $W_{out}$ is usually computed using the ridge regression method.²⁷

There are some variations of the presented Shallow ESN model, such as including $y (k)$ in the reservoir state equation (see equation 1) as a feedback connection, or incorporating direct connections from the input to the readout layer. However, these configurations, that are described in Lukoševičius,²⁷ fall outside the scope of this work, these will not be discussed here.

In contrast, the Deep ESN architecture, which was introduced in Gallicchio et al.,¹⁸ comprises multiple reservoir layers arranged sequentially: the first reservoir directly processes the input vector, while each subsequent reservoir takes as input the state vector produced by the preceding layer. The architecture of a Deep ESN is depicted in Figure 2. This hierarchical structure allows the network to capture complex temporal dependencies across different time scales, enhancing its ability to model complex dynamical systems.

Figure 2.

Structure of a deep ESN.

Typically, all reservoirs in the architecture are designed with the same number of units, denoted as $N_{R}$ . Thus, for a configuration with $N_{L}$ layers, the total number of reservoir units in the hierarchical model can be calculated as $N_{T} = N_{R} \times N_{L}$ . The state equations for the reservoirs in a Deep ESN are similar to those of a Shallow ESN, with the main difference being the inclusion of the state vector from the previous layer in the input to the current layer. For the first layer ( $l = 1$ ), the state equation is given by:

\begin{aligned} x^{1} (k) & = (1 - α^{1}) x^{1} (k - 1) \\ + α^{1} ϕ (W_{res}^{1} x^{1} (k - 1) + W_{in}^{1} u (k) + θ^{1}) \end{aligned}

(3)

and for other layers (

l > 1

\begin{aligned} x^{l} (k) = & (1 - α^{l}) x^{l} (k - 1) \\ + α^{l} ϕ (W_{res}^{l} x^{l} (k - 1) + W_{in}^{l} x^{l - 1} (k) + θ^{l}) \end{aligned}

(4)

In these two formulas, $u (k) \in R^{N_{U}}$ denotes the input vector at time $k$ , while $x^{l} (k) \in R^{N_{R}}$ represents the reservoir state vector for layer $l$ at the same time step. The parameter $α^{l} \in [0, 1]$ defines the leaking rate for layer $l$ , and $ϕ$ serves as the activation function, identical to that used in Shallow ESNs. The matrix $W_{res}^{l} \in R^{N_{R} \times N_{R}}$ corresponds to the reservoir weight matrix for layer $l$ , and $W_{in}^{l}$ is the input weight matrix for the same layer. Specifically, $W_{in}^{1} \in R^{N_{R} \times N_{U}}$ applies to the first layer, while $W_{in}^{l} \in R^{N_{R} \times N_{R}}$ applies to subsequent layers.

Although Deep ESNs add hierarchical structure, they share key operational mechanisms with Shallow ESNs: the weight matrices of all the reservoirs are initialised with random values drawn from a uniform distribution (e.g., $[0, 1]$ ) and subsequently scaled to ensure that the system operates at the edge of stability. This stability is guided by the Echo State Property (ESP) for Deep ESNs, described in detail in Gallicchio and Micheli.²⁸ The ESP requires the spectral radius $ρ_{max}^{l}$ of each reservoir layer to be less than 1. Similarly, the input weight matrix $W_{in}$ and the inter-layer connection matrices ${W_{in}^{l}}_{l = 2}^{l = N_{L}}$ are initialised using a uniform distribution and scaled to a previously specified range $[- i s s, i s s]$ .

For producing the output of the network, the most common approach is to link the states from all reservoir layers to the readout. This configuration, illustrated in Figure 2, computes the output as a weighted sum of the states from all layers, as described by the following equation:

y (k) = W_{out} x (k) + θ_{out} = W_{out} [\begin{matrix} x^{1} (k) \\ x^{2} (k) \\ ⋮ \\ x^{N_{L}} (k) \end{matrix}] + θ_{out}

(5)

In this equation, $W_{out} \in R^{N_{Y} \times N_{L} N_{R}}$ denotes the output weight matrix, $x (t) \in R^{N_{L} N_{R}}$ represents the concatenated state vector combining the outputs of all reservoir layers, and $θ_{out} \in R^{N_{R}}$ is the bias vector for the output.

Deep ESNs offer several notable advantages compared to Shallow ESNs. These key benefits are detailed in Gallicchio et al.,²⁹ and are briefly outlined as follows. Firstly, Deep ESNs provide a hierarchical structure that enables the modelling of multi-scale temporal dynamics across layers, with higher layers typically exhibiting slower dynamics. Additionally, given the same total number of reservoir units, Deep ESNs have a greater memory capacity, enabling them to capture and utilize information from earlier inputs more effectively. Another benefit is that distributing the reservoir units across multiple layers decreases the number of non-zero recurrent connections, improving computational efficiency. As a result, for a fixed total number of units, Deep ESNs can achieve significantly better performance than Shallow ESNs.

3. Deep RC implementation

This section describes the proposed steps to implement Deep Reservoir Computing for modelling an industrial process. This implementation approach addresses specific industrial requirements: compared to conventional deep learning, it offers reduced training and deployment time, real-time operation capabilities with reduced inference times, and hardware compatibility through optimized deep structures for low-cost embedded devices. These features make the solution suitable for industrial automation, where cost, timing precision, and rapid deployment are critical.

The implementation process starts with the selection of the best Deep RC architectural setup to address different operational scenarios in the modelled industrial system. The next step is to evaluate the performance of the best setups on their own and against LSTM reference models. Finally, the selected setups are implemented on embedded devices.

3.1. Selection of the best deep architectural setup

To begin with, based on the industrial process to be modelled, different operating scenarios and various configurations of inputs and outputs are defined for the models. These scenarios and input and output configurations are defined according to the characteristics of the industrial process, taking into account the variables of interest and the modelling objectives. Once the scenarios and the input and output configurations of the models have been defined, the first step for the RC implementation is the selection of the best architectural configuration for each of the scenarios and tanks.

The key design consideration for the Deep RC architecture is determining the optimal number of reservoir layers. In addition, other hyperparameters have to be selected in order to generate a faithful model. For this purpose, the performance of the Deep ESN models is evaluated with different numbers of reservoir layers. The aim is to have the performance defined as a function of the number of reservoir layers, thus analysing the performance improvement when adding more reservoir layers to the ESN setup.

The Deep ESN selection process is structured as show in the algorithm:

First, for each of the previously defined architectural setups, a list of numbers of layers to test is selected, e.g. from 1 to 10 layers. Then, for each number in the range, a search for the best hyperparameters is performed on the training and validation data. When the number of reservoir layers in the model is more than 1, the reservoirs in the different layers are initialised with the same hyperparameters but with different random seeds. Taking into account the randomness involved in ESNs, different models are generated for each set of hyperparameters tested, to make a more objective assessment and to consider the mean of the losses obtained in the validation data.

In order to compare the models, regardless of the number of layers, the total number of internal units in the reservoirs is kept constant and evenly distributed across all layers as more layers are added. For example, if a model is defined with 1000 internal units, with 4 layers each layer will have 250 internal units and with 8 layers each layer will have 125 internal units.

Once the best hyperparameters are found according to the lowest loss on validation data, some new random models are trained with the best parameters, and then the performance of each new model is evaluated on the test dataset. After testing different numbers of layers and getting the mean test loss for each one, the best Deep ESN model is selected based on the calculated metrics: the model with the number of layers that generates the least error.

The selection process is then repeated for each of the defined architectural setups. The aim is to find the best Deep ESN setup for each of the tanks, that is, the setup that generates the best performance for the upper tank and the lower tank. The selected setups and the best number of Deep ESN layers in these setups are the ones that will be used in the next steps of the implementation process.

All the ESN models are implemented using Python library ReservoirPy,³⁰ which allows the user to configure and parameterise the different elements of an ESN (input, reservoir, readout $\dots$ ) as independent nodes than can be then interconnect as desired. ReservoirPy also includes Hyperopt, another Python library used for hyperparameter search.

3.2. Performance evaluation against LSTM

Once the best Deep ESN setup has been selected for each of the tanks, the next step is to evaluate the performance of the selected models against a reference model. In this case, the reference model is a Long Short-Term Memory network, as it has proven to perform quite well in time series prediction tasks. The evaluation and comparison process is assessed taking into account the following aspects:

Training time: time required to train the model with a defined trained dataset.

Sample prediction time: mean time required to predict each of the steps in the time series.

Performance loss metrics.

The aim is to evaluate how ESN improves efficiency over LSTM, not only in terms of performance metrics, but also in terms of training time requirements and prediction speed, which are crucial for real-time applications. In this case, both Shallow ESN and Deep ESN are included to further compare the improvement between both architectures and LSTM.

The widely used Python libraries TensorFlow³¹ and Keras³² are selected for the implementation of the LSTM models, These libraries allow the user to define and parameterise the different elements of a LSTM model. The optimisation of the models is performed with Adam algorithm.

3.3. Deep RC on embedded devices

The final step in the implementation process is to deploy the selected Deep RC models on embedded devices with limited computational resources in order to evaluate these models in real-time industrial applications. The evaluation will take into account the following aspects:

Training time: the time required to train the model with a defined training dataset.

Dataset prediction time: the time required to predict the whole test dataset.

Sample prediction time: the mean time required to predict each of the steps in the time series.

In this work, three different embedded devices were selected: a Raspberry Pi Zero 2 W, a Magelis IIot Core Box from Schneider Electric, and a Simatic IoT2050 from Siemens. All of these IoT devices share the same 1GHz quad-core Arm Cortex-A53 CPU, but have different amounts of RAM and other hardware variations. The Raspberry Pi Zero 2 W is a low-cost device easily accessible for all users which has 512MB of RAM. The Magelis IIot Core Box is a more powerful device designed for edge computing tasks in industrial environments, with 1GB of RAM. Finally, the Simatic IoT2050, also designed for industrial IoT tasks, is a device with 2GB of RAM. The local PC used for hyperparameter search, evaluation and model selection tasks has a 3.4GHz seventh-generation Intel Core i5 processor and 8GB of RAM.

4. Experimentation system

This section provides a description of the experimental environment used in this work. In addition, the different operating scenarios considered for the four-tank system are detailed, as well as the different setups for the Deep ESN models that will be evaluated. Finally, the datasets used to work with the plant are described.

4.1. Description of the four-tank system

The experimental setup, located at the Remote Laboratory of Automatic Control at the University of León, consists of a real industrial implementation of the well-known four-tank system, initially proposed by Karl Henrik Johansson.³³ The industrial plant used for this work and its schematic representation are shown in Figure 3.

Figure 3.

Industrial plant and its diagram. (a) Industrial plant used and (b) Industrial plant diagram.

This plant is composed of four water tanks arranged in two vertical pairs. Water is supplied to the four tanks from a lower reservoir tank using two twin centrifugal pumps controlled by variable frequency drives, generating a multi-level control scenario. As the water leaving an upper tank feeds the one immediately below it, the level in the latter depends on the level in the former, creating a set of interconnected dynamics.

The water driven by the pumps is distributed to the tanks by means of two three-way pneumatic valves, so that each pump-valve set delivers water to two of the four tanks. In addition, the valves and tanks are cross-connected so that each valve supplies water to the adjacent lower tank and the opposite upper tank.

This configuration creates a complex and interdependent control system, where the water levels in the lower tanks are affected not only by the pumps and valves but also by the interactions with the other tanks. Finally, to measure water levels, each tank is equipped with a pressure sensor at the base. Table 1 outlines the main inputs and outputs of the plant that will be used for the models.

The mathematical representation of the four-tank process is derived from the fundamental principles of fluid dynamics, namely Bernoulli’s principle and the mass balance law. These principles lead to a set of differential equations that describe the evolution of water levels in each tank over time. To model the system’s behaviour, a state-space representation is employed, which is widely used in control engineering due to its ability to capture the dynamic relationships between system variables.

However, translating this theoretical model into a physical industrial setup introduces real-world complexities. This system was specifically selected because it encapsulates several common challenges in industrial environments, such as nonlinear dynamics, measurement noise, dead zones, and actuator asymmetries. These factors lead to discrepancies between the theoretical model and actual behaviour, reflecting conditions typically found in industrial automation scenarios. As such, the four-tank system serves as both a controlled experimental platform and a representative proxy for real industrial processes. Addressing its complexities requires advanced modelling techniques, such as machine learning approaches, that can better capture and compensate for nonlinear behaviours, overcoming the limitations of traditional linear control methods.

4.2. Definition of operating scenarios and model setups

The four-tank plant as a whole operates as a multivariate control system, where pumps (P01, P02) and valves (V01 and V02) are the input variables, and the desired water levels in each tank (L01, L02, L03 and L04) are the output variables. Therefore, the first option considered is to model the entire plant as a single system, where the inputs are the pumps and valves and the outputs are the levels of the tanks. This configuration is called 4-TANK and it is the most complex, as it takes into account all the inputs and outputs of the system. This setup is the one represented in Figure 3(b).

However, defining a single model of the whole process that performs well in all circumstances is not that straightforward, as it implies greater complexity and difficulty. It is therefore proposed to divide the process into different subsystems, each focused on predicting the level of a reservoir. This allows the models developed with these subsets to be compared with the single model of the whole plant, to see which of them gives better results.

By considering the system in terms of different subsystems, it is possible to work with different operating scenarios depending on which system inputs are operating at any given time. Thus, if each pump-valve set is considered as a single input or source of water flow, two main different operating scenarios can occur in the four-tank process: if only one pump and one valve are activated, a Single Input-Single Output (SISO) scenario occurs, while if both pump-valve sets are activated, a Multiple Input - Multiple Output (MIMO) scenario occurs.

In this work, therefore, apart from the whole setup 4-TANK it is proposed to work with a SISO-type scenario and a MIMO-type scenario, defining different setups in both scenarios. Assuming a symmetrical plant configuration, with two separated sets of two tanks connected in series (tank 3 and tank 1, tank 4 and tank 2), the set of tanks 3 and 1 is considered as a reference for this work, with tank 3 being chosen as the upper tank and tank 1 as the lower tank. The focus of this paper is therefore to find the best Deep RC setup for modelling the upper and lower tanks in these scenarios, selecting the setup that has the best performance for each of them.

Firstly, Figure 4 shows the scheme of the four-tank system considered for the SISO scenario, where tank 3 is the upper tank and tank 1 is the lower tank. In this scenario, there is an upper tank that can be supplied with water by the opposite pump-valve set (in this case P02 and V02) and a lower tank that receives the water discharged from the upper tank. This is the scenario that was the subject of the Deep ESN work previously developed with the four-tank system.

Table 1.
Variables of the quadruple-tank process.

TAG Variable Range

L01 Level of Tank 1 0–100%

L02 Level of Tank 2 0–100%

L03 Level of Tank 3 0–100%

L04 Level of Tank 4 0–100%

V01 Valve for Tank 1 & 4 0–100%

V02 Valve for Tank 2 & 3 0–100%

P01 Pump for Tank 1 & 4 0–100%

P02 Pump for Tank 2 & 3 0–100%

TAG	Variable	Range
L01	Level of Tank 1	0–100%
L02	Level of Tank 2	0–100%
L03	Level of Tank 3	0–100%
L04	Level of Tank 4	0–100%
V01	Valve for Tank 1 & 4	0–100%
V02	Valve for Tank 2 & 3	0–100%
P01	Pump for Tank 1 & 4	0–100%
P02	Pump for Tank 2 & 3	0–100%

Figure 4.

SISO scenario for the four-tank system.

In the SISO scenario, the modelling of the upper tank is more straightforward: the process inputs that influence the tank level, to be used as model inputs, are the pump P02 and the valve V02, while the output is the tank level L03. In this case, the setup of inputs and outputs considered is called SISO-UP, and there are no other alternative variables that can be used as model inputs.

In contrast, the modelling of the lower tank is more complex, as the level of tank L01 depends on the level of the upper tank L03 and, consequently, on the system inputs P02 and V02. Therefore, two different configurations have been considered for the lower tank in the SISO scenario, one using the level of tank L03 as model input, called SISO-LO-A, and one using pump P02 and valve V02 as model inputs, called SISO-LO-B. In both setups, the output of the proposed model is the level of the lower tank L01. It is important to note that the configuration SISO-LO-A is possible if there is access to the previous value of the upper tank level L03, otherwise there is no alternative but to use the setup SISO-LO-B.

Secondly, Figure 5 shows the scheme of the four-tank system considered for the MIMO scenario, where tank 3 and tank 1 are again the upper and lower tanks, respectively. In this scenario, the upper tank is again supplied with water by the opposite pump-valve set (P02 and V02), and the lower tank that receives the water discharged from the upper tank and also the water from the adjacent pump-valve set (P01 and V01).

Figure 5.

MIMO scenario for the four-tank system.

As it can be observed in the figure, the model for the upper tank is analogous to that defined in the SISO scenario, with identical inputs influencing the level of the tank. Therefore, the setup for the upper tank in the MIMO scenario can be replicated as in the SISO scenario, SISO-UP, and there is no need to consider it again.

The modelling of the lower tank is more complex than in the SISO scenario: the level of tank L01 depends on the level of the upper tank L03 and thus on the system inputs P02 and V02; but it also receives the flow from the adjacent pump and valve (system inputs P01 and V01). Therefore, two different setups are considered for the lower tank, one using as inputs the upper level L03 together with the pump P01 and the valve V01, called MIMO-LO-A; and another one using the pumps P01 and P02 together with the valves V01 and V02, called MIMO-LO-B. In both model architectures, the output of the proposed model is the lower tank level L01 and, as in the SISO scenario, if the previous value of the upper level L03 is not accessible, there is no choice but to use MIMO-LO-B setup.

Table 2 summarizes the input and output configurations for each of the evaluated setups. These setups can be organised depending on the tank level to be modelled as follows:

Table 2.

Definition of the different architectural setups to evaluate.

Setup name	Inputs	Outputs
SISO-UP	P02, V02	L03
SISO-LO-A	L03	L01
SISO-LO-B	P02, V02	L01
MIMO-LO-A	P01, V01, L03	L01
MIMO-LO-B	P01, V01, P02, V02	L01
4-TANK	P01, V01, P02, V02	L01-L04

To predict the level of the upper tank, for SISO and MIMO scenarios (it receives the water from the opposite pump in both):

SISO-UP

4-TANK

To predict the level of the lower tank, for SISO scenario (it receives the water from the opposite pump):

SISO-LO-A

SISO-LO-B

4-TANK

To predict the level of the lower tank, for MIMO scenario (it receives the water from opposite and adjacent pumps):

MIMO-LO-A

MIMO-LO-B

4-TANK

4.3. Description of the datasets

The datasets used in this work were obtained from the controller of the quadruple-tank system, where the water levels in the tanks and the pump and valve settings were recorded. Two different datasets were defined and captured: one for the SISO scenario, with just pump P02 and valve V02 connected, and another one for the MIMO scenario, with both pumps and valves working.

For both scenarios, the system operates in a closed-loop control mode with pumps as actuators to avoid saturation of the tanks and to maintain the water levels within the desired setpoints. Random setpoints for the levels and random opening percentages for the valves were therefore defined every 60 seconds, assuming that this would be sufficient time for the water levels to reach the steady state.

Both datasets were collected for 24 hours each one, with a sampling rate of 100 milliseconds. The datasets were then resampled to 5 seconds to obtain a better performance with the designed models, based on the previous works with the four-tank system.²¹ The datasets were then divided into three parts: training, validation, and test, with a ratio of 60%, 20% and 20% respectively.

5. Results

This section presents the results obtained from the implementation of the Deep RC models on the four-tank industrial process. The results are structured in three main sections: the selection of the best Deep RC architectural setup, the comparison of the selected models with LSTM reference models, and the implementation of the selected models on embedded devices.

5.1. Selection of best the architectural setup

The selection of the best Deep RC architecture is based on the performance of the models with different numbers of reservoir layers. More specifically, the models are evaluated with a range of layers from 1 to 10, and the performance is calculated for each number of layers on the test dataset, in terms of the Root Mean Squared Error (RMSE) and the Coefficient of Determination ( $R^{2}$ ) metrics. The evaluation is performed for each of the defined scenarios and model configurations.

The hyperparameters of the Deep ESN models are selected from a range of values, as shown in Table 3, using a random search algorithm. Thus, at each iteration, a set of hyperparameters is randomly selected from the defined ranges, and the model is initialised with these hyperparameters. As it is known that the performance of an ESN model improves with a larger number of reservoir internal units, this parameter is left fixed and limited to $1200$ , knowing that this value has worked well in previous experiments²¹ and trying to avoid unnecessarily large models. This value is always the same regardless of the number of layers, and it is then divided over all the reservoirs. The spectral radius is tested in the range between $0.1$ and $1.3$ , knowing that the best performance is usually obtained with values close to $1$ . The leaking rate is tested with values distributed between $1 \times 10^{- 6}$ and $1$ , and the ridge regularizer in the range between $1 \times 10^{- 9}$ and $1 \times 10^{- 1}$ .

Table 3.
Tested ranges of ESN hyperparameters and best options selected for SISO and MIMO scenarios.

Parameter Tested values Best SISO Best MIMO

Total internal units 1200 1200 1200

Spectral radius 0.1, 0.5, 0.9, 1.3 0.9 0.9

Leaking rate 1 $\times 10^{- 6}$ , 1 $\times 10^{- 4}$ , 1 $\times 10^{- 2}$ , 1 $\times 10^{0}$ 1 $\times 10^{0}$ 1 $\times 10^{0}$

Input scaling 1 1 1

Inter-layer scaling 0.1, 0.5, 1, 1.5 0.1 0.1

Input bias scaling 0.2, 0.4, 0.8, 1 1 0.8

$W_{res}^{l}$ connectivity 0.15 0.15 0.15

$W_{in}^{l}$ connectivity 0.15 0.15 0.15

Ridge regularizer 1 $\times 10^{- 9}$ , 1 $\times 10^{- 6}$ , 1 $\times 10^{- 3}$ , 1 $\times 10^{- 1}$ 1 $\times 10^{- 6}$ 1 $\times 10^{- 3}$

Parameter	Tested values	Best SISO	Best MIMO
Total internal units	1200	1200	1200
Spectral radius	0.1, 0.5, 0.9, 1.3	0.9	0.9
Leaking rate	1 $\times 10^{- 6}$ , 1 $\times 10^{- 4}$ , 1 $\times 10^{- 2}$ , 1 $\times 10^{0}$	1 $\times 10^{0}$	1 $\times 10^{0}$
Input scaling	1	1	1
Inter-layer scaling	0.1, 0.5, 1, 1.5	0.1	0.1
Input bias scaling	0.2, 0.4, 0.8, 1	1	0.8
$W_{res}^{l}$ connectivity	0.15	0.15	0.15
$W_{in}^{l}$ connectivity	0.15	0.15	0.15
Ridge regularizer	1 $\times 10^{- 9}$ , 1 $\times 10^{- 6}$ , 1 $\times 10^{- 3}$ , 1 $\times 10^{- 1}$	1 $\times 10^{- 6}$	1 $\times 10^{- 3}$

In contrast, the scaling of the first input weight matrix $W_{in}^{1}$ is set to $1$ , a value that has worked adequately in previous experiments with Shallow ESNs. For the scaling of the inter-reservoir matrices ${W_{in}^{l}}_{l = 2}^{l = N_{L}}$ , different values are tested between $0.1$ and $1.5$ , to evaluate which one produces better results in the model. The input bias scaling is tested in the range between $0.2$ and $1$ . Finally, the connectivity of both the input weight matrices $W_{in}^{l}$ and the reservoir weight matrices $W_{res}^{l}$ is set to $0.15$ , a value that has also shown good results in previous experiments.²¹

The best hyperparameters for each scenario, model configuration and number of layers are chosen based on the lowest RMSE value obtained on the validation dataset. Even though there are minor variations depending on the number of layers used, the best hyperparameters are the same for almost all the tested setups within the same scenario. Those best values are also shown in Table 3.

Once the best hyperparameters for each setup are defined, new randomly initialised models are trained for each number of layers within the given range, and the performance of the models is evaluated on the test dataset. Given the randomness implicit in the ESNs, this process is repeated five times to analyse the variability of the results, calculating the mean loss and its standard deviation.

The mean test RMSE obtained and the corresponding standard deviation are presented in Figures 6 to 8, for the SISO-UP, SISO-LO and MIMO-LO scenarios, respectively. It should be noted that the 4-TANK setup is included in all the scenarios, particularised for the corresponding tank, and considering the value of the deactivated inputs as 0.

Figure 6.

Mean RMSE and its standard deviation obtained with Deep ESN for the upper tank.

Figure 7.

Mean RMSE and its standard deviation obtained with Deep ESN for the lower tank in SISO scenario.

Figure 8.

Mean RMSE and its standard deviation obtained with Deep ESN for the lower tank in MIMO scenario.

The results show that the performance of the models improves as the number of layers increases, with the RMSE decreasing and the $R^{2}$ increasing. However, the improvement is not linear, and the best performance is not achieved with the highest number of layers. The best number of layers for each setup is selected based on the lowest RMSE and the highest $R^{2}$ obtained on the test dataset. The best Deep ESN setups for each of the tanks are shown in Table 4, compared to the equivalent Shallow ESN setup, which is the same Deep ESN but with only one reservoir layer. The table also shows the percentage of improvement in RMSE and $R^{2}$ obtained with the best Deep ESN setup compared to the Shallow ESN setup.

Table 4.

Comparison of RMSE and $R^{2}$ for the best-performing deep ESN and shallow ESN setups on test data.

		Shallow ESN		Best Deep ESN			Improvement (%)
Tank (scenario)	Setup	RMSE	$R^{2}$	RMSE	$R^{2}$	Layers	RMSE	$R^{2}$
Upper tank (ALL)	SISO-UP	0.0520	0.9699	0.0456	0.9768	7	12.31%	0.71%
	4-TANK (L03)	0.0569	0.9631	0.0465	0.9753	6	18.28%	1.27%
Lower tank (SISO)	SISO-LO-A	0.0421	0.9804	0.0328	0.9881	9	22.09%	0.79%
	SISO-LO-B	0.1194	0.8424	0.0823	0.9244	9	31.06%	9.73%
	4-TANK (L01)	0.1332	0.8010	0.0881	0.9128	9	33.86%	13.95%
Lower tank (MIMO)	MIMO-LO-A	0.1133	0.8585	0.0900	0.9107	7	20.58%	6.08%
	MIMO-LO-B	0.1526	0.7430	0.1176	0.8474	8	22.94%	14.05%
	4-TANK (L01)	0.1547	0.7361	0.1201	0.8407	9	22.36%	12.44%

Figure 9.

Prediction for lower and upper tanks with the selected best Deep ESN setups. (a) Prediction for the lower tank (tank 1) with the best Deep ESN MIMO-LO-A setup and (b) Prediction for the upper tank (tank 3) with the best Deep ESN SISO-UP setup.

It can be observed that the improvement in RMSE ranges from 12.31% to 33.86%, and the improvement in $R^{2}$ ranges from 0.71% to 14.05%. The setup for the upper tank is the one that shows the least improvement, with a 12.31% reduction in RMSE and a 0.71% increase in $R^{2}$ . This lower improvement can be explained by the fact that the upper tank has a water supply directly from the pump and valve, and therefore has a simpler and direct dynamic, with less different time scales. However, as shown in Figure 6, it is still a better option that using the 4-TANK setup, which is the most complex and complete setup but also implies more difficulties to obtain an acceptable prediction without a higher number of internal units.

In contrast, the setup for the lower tank with the best improvement is the SISO-LO-B, with a 31.06% reduction in RMSE and a 9.73% increase in $R^{2}$ . The MIMO-LO-B setup also shows a significant improvement, with a 22.94% reduction in RMSE and a 14.05% increase in $R^{2}$ . This can be explained by the fact that the lower tank has a more complex dynamic, with the level of the tank depending on the level of the upper tank and the water flow from the adjacent pump and valve. More specifically, the level depends on the opposite pump and valve setpoints and this fact generate slower dynamics, as changes in P02 and V02 affect the upper tank first but need more time to influence the lower tank level.

The best Deep ESN setups are selected based on the best performance metrics obtained as follows. On the one hand, for the upper tanks there are two possible setups for both SISO and MIMO scenarios, which are the SISO-UP and the 4-TANK setup. Among these setups, the best one is the SISO-UP setup with a Deep ESN of 7 layers.

On the other hand, for the lower tank there are three possible setups for the SISO scenario and three setups for the MIMO scenario. Among these setups, the best one in the SISO scenario is the SISO-LO-A setup with a Deep ESN of 9 layers; and the best in the MIMO scenario is also the MIMO-LO-A setup with a Deep ESN of 7 layers.

Taking this into account, to model the level of the lower tanks only P02 and V02 are working (SISO scenario), the SISO-LO-A setup with a Deep ESN of 9 layers can be used. However, if both pumps and valves are connected (MIMO scenario), the MIMO-LO-A setup with a Deep ESN of 7 layers should be used instead. For this reason, the MIMO-LO-A setup is selected to model the levels of the lower tanks, as it is the one that can be used in both scenarios.

Therefore, the SISO-UP and MIMO-LO-A setups are the most suitable for modelling the upper and lower tanks, respectively, and are the ones that will be used in the next steps of the implementation process. To conclude this step, Figure 9 shows the performance of the best Deep ESN setups compared to the real levels for each of the tanks.

5.2. Comparison with LSTM models

The next step in the implementation process is to evaluate the performance of the selected Deep ESN setups against LSTM reference models. The LSTM models are trained, validated and tested with the same data as the ESN models. As two different setups were selected for Deep ESN models, one to predict the level of the upper tanks (SISO-UP) and another one to predict the level of the lower tanks (MIMO-LO-A), two LSTM models are trained for each of the setups.

The hyperparameters of the LSTM models are selected from the ranges of values shown in Table 5, using a Tree-structured Parzen Estimator (TPE) algorithm in this case. This algorithm is chosen because it is more suitable for models with long training times and slower convergence, such as LSTM models.

Table 5.
Tested ranges of LSTM hyperparameters and best options selected.

Parameter Tested values Best upper Best lower

Number of units 8 16 32 64 64 64

Dropout 0.0 0.1 0.2 0.4 0.2 0.4

Leaking rate $1 \times 10^{- 4}$ to $1 \times 10^{- 2}$ $1.6 \times 10^{- 4}$ $3.7 \times 10^{- 4}$

Epochs 200 200 200

Parameter	Tested values	Best upper	Best lower
Number of units	8 16 32 64	64	64
Dropout	0.0 0.1 0.2 0.4	0.2	0.4
Leaking rate	$1 \times 10^{- 4}$ to $1 \times 10^{- 2}$	$1.6 \times 10^{- 4}$	$3.7 \times 10^{- 4}$
Epochs	200	200	200

The comparison between the Deep ESN and LSTM models is performed in terms of RMSE and $R^{2}$ metrics, calculated on the test dataset. In addition, the training time for the whole training data and the prediction time per sample are also considered. Shallow ESN models for the same setups are also included in the comparison, to evaluate the improvement obtained with the Deep ESN models. The results obtained for the SISO-UP and MIMO-LO-A setups are shown in Table 6.

Table 6.

Performance of LSTM compared to the selected ESN and deep ESN setups.

Setup	Model	Fit Time (s)	Pred Time (ms/step)	RMSE	$R^{2}$
SISO-UP	LSTM	4322.44	439.11	0.0499	0.9694
	Shallow ESN	6.29	2.99	0.0520	0.9699
	Deep ESN	11.90	3.99	0.0456	0.9768
MIMO-LO-A	LSTM	4312.57	603.46	0.1029	0.8701
	Shallow ESN	7.05	2.99	0.1133	0.8585
	Deep ESN	12.15	4.99	0.0900	0.9107

The table shows that even with similar results in terms of RMSE and $R^{2}$ , the Deep ESN models slightly outperform the LSTM models, while the Shallow ESN models slightly underperform the LSTM models. However, the main improvement is seen in the training and prediction times: LSTM models take on average about 4315 seconds to train, while Deep ESN models require 12 seconds. In terms of prediction time per sample, LSTM models take on average about 500 milliseconds, compared to between 4 and 5 ms for Deep ESN models.

Thus, while Shallow ESNs are already able to outperform LSTM models in terms of training and prediction time, Deep ESN models not only continue to outperform in these terms, but also improve in RMSE and $R^{2}$ . Therefore, Deep ESN models are the most suitable for implementation in embedded devices, as they are able to offer similar performance to LSTM models, but with much shorter training and prediction times.

Moreover, in order to obtain a performance similar to that of Deep ESN models, LSTM models need to be much larger, with many more trainable parameters than ESN models: in the performed experiments, about 17217 trainable weights are handled in LSTM models compared to 1201 in ESN models, taking into account that in ESN models only the 1200 readout weights and the output bias are trained.

While Shallow ESNs already offer a good balance between simplicity and efficiency, the results show that Deep ESNs provide additional improvements in accuracy, especially in more complex setups such as MIMO-LO-A. In these scenarios, where dynamic interactions and multi-scale temporal behaviours are more significant, the layered structure of Deep ESNs captures the underlying system dynamics more effectively. Although adding more reservoir layers slightly increases computational time, the improvement in prediction performance makes Deep ESNs particularly valuable for industrial systems that require greater modelling accuracy.

5.3. Performance on embedded devices

The final step in the implementation process is to deploy the selected Deep ESN models on embedded devices with limited computational resources, to evaluate the performance of the models in real-time industrial applications. In this case, both selected Deep ESN models for SISO-UP and MIMO-LO-A setups are implemented, taking into account the training time, the prediction time for the whole test dataset, and the prediction time per sample. The evaluation is performed on three different embedded devices: a Raspberry Pi Zero 2 W, a Magelis IIot Core Box from Schneider Electric, and a Simatic IoT2050 from Siemens. The results obtained are compared with the performance metrics obtained on a desktop PC, which was used for the previous training, hyperparameter search, evaluation and model selection tasks.

The performance metrics obtained for the selected Deep ESN models on the different devices are shown in Table 7. The table shows the mean training time, the mean prediction time for the whole test dataset, and the mean prediction time per sample for the SISO-UP and MIMO-LO-A setups.

Table 7.
Mean performance metrics for embedded deep ESN models on a raspberry pi zero 2 W (pi zero), a magelis IioT core box (IIoT box), and a simatic IoT 2050 (ioT 2050). The performance metrics are evaluated with a set of 10,000 training samples and 4,000 testing samples.

Setup Metric Local PC Pi Zero IIoT Box IoT 2050

SISO-UP Training time (s) 10.797 96.353 116.947 113.418

Dataset prediction time (s) 4.073 33.593 42.161 41.398

Sample prediction time (ms/sample) 1.018 8.390 10.532 10.340

MIMO-LO-A Training time (s) 11.443 100.159 118.177 112.488

Dataset prediction time (s) 4.476 33.589 42.717 41.270

Sample prediction time (ms/sample) 1.118 8.386 10.671 10.309

Setup mean Training time (s) 11.120 98.256 117.562 112.953

Dataset prediction time (s) 4.274 33.591 42.439 41.334

Sample prediction time (ms/sample) 1.068 8.388 10.601 10.324

Setup	Metric	Local PC	Pi Zero	IIoT Box	IoT 2050
SISO-UP	Training time (s)	10.797	96.353	116.947	113.418
	Dataset prediction time (s)	4.073	33.593	42.161	41.398
	Sample prediction time (ms/sample)	1.018	8.390	10.532	10.340
MIMO-LO-A	Training time (s)	11.443	100.159	118.177	112.488
	Dataset prediction time (s)	4.476	33.589	42.717	41.270
	Sample prediction time (ms/sample)	1.118	8.386	10.671	10.309
Setup mean	Training time (s)	11.120	98.256	117.562	112.953
	Dataset prediction time (s)	4.274	33.591	42.439	41.334
	Sample prediction time (ms/sample)	1.068	8.388	10.601	10.324

It can be seen that the three embedded devices have similar training times, averaging between 98 and 118 seconds to train the full dataset. The prediction time for the whole test dataset is between 34 and 42 seconds on the embedded devices, compared to 4 seconds on the local PC. Finally, the prediction time per sample is between 8 and 10 ms per sample, compared to 1 ms on the local PC.

The results show that the mean training times are significantly higher on the embedded devices than on the local PC, with an increase of about 10 times. However, the prediction times are still acceptable for real-time applications, with an mean prediction time per sample of about 10 ms, which is suitable even for the control of industrial processes. Therefore, the results show that the selected Deep ESN models are able to function in real-time on embedded devices, with a suitable performance for industrial applications.

6. Conclusions

This work assesses the performance of Deep Reservoir Computing models when implemented in real-time industrial applications on embedded devices. The proposed approach is applied to a four-tank industrial process. The implementation process is structured in three main steps: the selection of the best Deep RC architectural setup, the comparison of the selected models with LSTM reference models, and the implementation of the selected models on embedded devices.

The results show that the Deep ESN models outperform the LSTM models in terms of training and prediction times, while also offering similar performance in terms of RMSE and $R^{2}$ . The Deep ESN models are able to offer a significant improvement in RMSE and $R^{2}$ compared to Shallow ESN models, with an improvement ranging from 12.31% to 31.06%. The best Deep ESN setups for the upper and lower tanks are the SISO-UP and MIMO-LO-A setups respectively, with the best number of layers being 7 and 9 each.

These findings highlight the potential of Deep ESNs as lightweight and efficient models well-suited to the increasing demand for real-time machine learning solutions in industrial applications. Their reduced computational cost, combined with competitive accuracy, makes them attractive for deployment in constrained embedded environments.

Finally, implementing the selected Deep ESN models on embedded devices demonstrates their ability to operate in real-time, achieving a mean prediction time per sample of approximately 10 ms. This performance makes them appropriate even for industrial process control. The results show that Deep ESN models are suitable for implementation on embedded devices, as they offer similar performance to LSTM models, but with much shorter training and prediction times.

Nonetheless, certain limitations should be noted. One challenge relates to the scalability and generalization of the models when applied to larger or more complex industrial systems. In such cases, the reservoir matrices may grow too large to be processed efficiently on embedded hardware. One possible solution is to adopt a modular approach, whereby the overall system is decomposed into smaller, semi-independent subsystems, each of which is handled by its own Deep ESN, as has been done with the four-tank system. This decomposition would simplify dynamics and allow computational resources to be distributed more effectively.

For this reason, to expand the proposed approach to broader industrial environments, future work will explore the integration of multiple Deep ESNs in a distributed architecture, with each model responsible for a subset of variables or a specific process section.

Future work will also focus on implementing Deep ESN models in a real-time control system to evaluate their performance in real industrial applications. In addition, the possibility of adapting the models in real time with online readout on embedded devices will be tested, as has been done in previous works with Shallow ESNs on desktop PCs. This would allow real-time adaptation of Deep ESN models to changes in the system, which could further improve their performance in real-time industrial applications. Analysing how energy consumption increases when more reservoir layers are added is also particularly important for deploying the models on embedded systems. Therefore, this is another key area for future research.

The robustness of the models in the presence of real industrial noise and operating variability, which are common in production environments, will also be the subject of examination in future work. In addition, neural network ablation studies may be conducted to evaluate the relative contribution of each reservoir layer and connection pattern. This analysis could provide insights for optimizing the architecture and further reducing computational load without compromising performance.

Finally, future work could also explore potential applications of Deep ESN-based models not only for control but also for real-time anomaly detection, leading to intelligent monitoring solutions in industrial systems.

Footnotes

Funding

This work was supported by the Spanish State Research Agency, MCIN/ AEI/ 10.13039/ 501100011033 under Grants PID2020-117890RB-I00 and PID2020-115401GB-I00; and by EU-EIC EMERGE (Grant No. 101070918) and NEURONE, a project funded by the Italian Ministry of University and Research (PRIN 20229JRTZA). The work of José Ramón Rodríguez-Ossorio was supported by a grant from the 2020 Edition of Research Programme of the University of León.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

José Ramón Rodríguez-Ossorio

Claudio Gallicchio

Antonio Morán

Ignacio Díaz

Juan J Fuertes

Manuel Domínguez

References

Ruiz

Díaz

González

, et al. Improving the competitiveness of aircraft manufacturing automated processes by a deep neural network. Integr Comput Aided Eng 2023; 30: 341–352.

Pezeshki

Adeli

Pavlou

, et al. State of the art in structural health monitoring of offshore and marine structures. Proc Inst Civil Eng - Maritime Eng 2023; 176: 89–108.

Jordanou

Antonelo

Camponogara

. Echo state networks for practical nonlinear model predictive control of unknown dynamic systems. IEEE Trans Neural Netw Learn Syst 2022; 33: 2615–2629.

Salehinejad

Sankar

Barfett

, et al. Recent Advances in Recurrent Neural Networks, 2018. 10.48550/arXiv.1801.01078. ArXiv:1801.01078 [cs].

Zhou

Fan

Neri

. A spatio-temporal fusion deep learning network with application to lightning nowcasting. Integr Comput Aided Eng 2024; 31: 233–247.

LeCun

Bengio

Hinton

. Deep learning. Nature 2015; 521: 436–444.

Novotny

Liu

Morales-Alvarez

, et al. Vehicle side-slip angle estimation under snowy conditions using machine learning. Integr Comput Aided Eng 2024; 31: 117–137.

Smith

Kumar

Zhang

. An efficient deep learning model for real-time fault diagnosis in industrial systems. Integr Comput Aided Eng 2024; 32: 45–60.

Candela

Giordano

Zagaria

, et al. Effectiveness of deep learning techniques in tv programs classification: A comparative analysis. Integr Comput Aided Eng 2024; 31: 439–453.

10.

Macas Ordónez

BdC

Garrigós

Martínez

, et al. An explainable machine learning system for left bundle branch block detection and classification. Integr Comput Aided Eng 2024; 31: 43–58.

11.

Tanoni

Sobot

Principi

, et al. A weakly supervised active learning framework for non-intrusive load monitoring. Integr Comput Aided Eng 2025; 32: 37–54.

12.

Gao

, et al. Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision, 2022. 10.48550/arXiv.2205.11913. ArXiv:2205.11913 [cs].

13.

Pandey

Fernandez

Gentile

, et al. The transformational role of GPU computing and deep learning in drug discovery. Nat Mach Intell 2022; 4: 211–221.

14.

Sherstinsky

. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D: Nonl Phenomena 2020; 404: 132306.

15.

Tortorella

Gallicchio

Micheli

. Hierarchical dynamics in deep echo state networks. In: Pimenidis E, Angelov P, Jayne C et al. (eds.) Artificial neural networks and machine learning – ICANN 2022. Lecture Notes in Computer Science, 2022, pp.668–679. Cham: Springer Nature Switzerland. ISBN 978-3-031-15934-3. DOI: 10.1007/978-3-031-15934-3_55.

16.

Lee

Tanaka

Gupta

. Adaptive reservoir computing for time-series prediction in manufacturing processes. Integr Comput Aided Eng 2022; 30: 215–230.

17.

Lukoševičius

Jaeger

. Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 2009; 3: 127–149.

18.

Gallicchio

Micheli

Pedrelli

. Deep reservoir computing: A critical experimental analysis. Neurocomputing 2017; 268: 87–99.

19.

Gallicchio

Micheli

Pedrelli

. Hierarchical temporal representation in linear reservoir computing. In: Esposito A, Faundez-Zanuy M, Morabito FC et al. (eds.) Neural advances in processing nonlinear dynamic signals. Smart Innovation, Systems and Technologies, 2019, pp.119–129. Cham: Springer International Publishing. ISBN 978-3-319-95098-3. DOI: 10.1007/978-3-319-95098-3_11.

20.

Sun

Song

Cai

, et al. A systematic review of echo state networks from design to application. IEEE Trans Artif Intell 2024; 5: 23–37.

21.

Rodríguez-Ossorio

Gallicchio

Morán

, et al. Deep echo state networks for modelling of industrial systems. In: Engineering applications of neural networks, 2024, pp.106–119. Cham: Springer Nature Switzerland. ISBN 978-3-031-62495-7. DOI: 10.1007/978-3-031-62495-7_9.

22.

Shahi

Fenton

Cherry

. Prediction of chaotic time series using recurrent neural networks and reservoir computing techniques: A comparative study. Mach Learn Appl 2022; 8: 100300.

23.

Jaeger

. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German Natl Res Cent Inform Technol GMD Techn Rep 2001; 148: 13.

24.

Jaeger

Lukoševicius

Popovici

, et al. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw 2007; 20: 335–352.

25.

Gallicchio

Micheli

. Architectural and markovian factors of echo state networks. Neural Netw 2011; 24: 440–456.

26.

Yildiz

Jaeger

Kiebel

. Re-visiting the echo state property. Neural Netw 2012; 35: 1–9.

27.

Lukoševičius

. A practical guide to applying echo state networks. In: Neural Networks: Tricks of the Trade: Second Edition, 2012. pp.659–686. Berlin, Heidelberg: Springer Berlin Heidelberg. ISBN 978-3-642-35289-8. DOI: 10.1007/978-3-642-35289-8_36.

28.

Gallicchio

Micheli

. Echo state property of deep reservoir computing networks. Cognit Comput 2017; 9: 337–350. DOI: 10.1007/s12559-017-9461-9

29.

Gallicchio

Micheli

Pedrelli

. Design of deep echo state networks. Neural Netw 2018; 108: 33–47.

30.

Trouvain

Pedrelli

Dinh

, et al. Reservoirpy: An efficient and user-friendly library to design echo state networks. In: Artificial neural networks and machine learning – ICANN 2020, 2020, pp.494–505. Cham: Springer International Publishing. ISBN 978-3-030-61616-8. DOI: 10.1007/978-3-030-61616-8_40.

31.

Abadi

, et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. https://www.tensorflow.org/. Software available from tensorflow.org.

32.

Chollet

, et al. Keras. https://keras.io, 2015.

33.

Johansson

. The quadruple-tank process: A multivariable laboratory process with an adjustable zero. IEEE Trans Control Syst Technol 2000; 8: 456–465. DOI: 10.1109/87.845876

Embedded deep reservoir computing for modelling complex industrial systems

Abstract

Keywords

1. Introduction

2. Deep reservoir computing

3.1. Selection of the best deep architectural setup

3.2. Performance evaluation against LSTM

3.3. Deep RC on embedded devices

4. Experimentation system

4.1. Description of the four-tank system

5. Results

5.1. Selection of best the architectural setup

Table 5. Tested ranges of LSTM hyperparameters and best options selected. Parameter Tested values Best upper Best lower Number of units 8 16 32 64 64 64 Dropout 0.0 0.1 0.2 0.4 0.2 0.4 Leaking rate 1 × 10 − 4 to 1 × 10 − 2 1.6 × 10 − 4 3.7 × 10 − 4 Epochs 200 200 200

Footnotes

Funding

Declaration of conflicting interests

ORCID iDs

References