Sage Journals: Discover world-class research

Abstract

Intrinsic viscosity is a critical evaluation indicator for polymer quality during the aggregation process. Accurate prediction of intrinsic viscosity is essential for the online monitoring and control of polymer quality. The aggregation process is a typical industrial production mode influenced by production continuity and varying sensor spatial locations. The data from this production process exhibits characteristics such as parameter redundancy, long time sequences, and time delays. Existing methods struggle to accurately match data features, resulting in poor prediction accuracy. To address these issues, we propose a three-stage method for predicting polymer intrinsic viscosity. Firstly, we introduce a key feature extraction method based on the maximum information coefficient and an approximate Markov blanket model to address the redundancy in aggregation process data parameters. Secondly, we propose a time delay analysis method based on cross-correlation to estimate time delay relationships and match time delay features in the aggregation process. Finally, we develop an improved information prediction model for long-term sequence prediction, utilizing locally sensitive hash attention to replace ProbSparse self-attention. This ensures the prediction effect remains effective as the sequence length increases. The results demonstrate that our proposed method reduces feature redundancy, calculates delay times similar to simulation results, and offers significant advantages in prediction accuracy compared to other time series prediction models under different sequence lengths.

Keywords

Polymer intrinsic viscosity time delay long sequence time series

Introduction

The polymerization process of polyester fibers is the initial step in their production, where the synthesized polymer is directly processed to form the fibers. Intrinsic viscosity is a crucial quality indicator of polymers,¹ reflecting their fluidity. Significant fluctuations in intrinsic viscosity can lead to fiber breakage during spinning and winding processes, causing material waste and severely affecting the quality of polyester fiber products while reducing production efficiency. Methods for determining the intrinsic viscosity of polymers are categorized into direct and indirect methods. The direct measurement method involves sampling the polymer and sending it to the laboratory for analysis.² This method cannot be measured online and has a time delay.³ In actual production, factories typically use indirect measurement methods to infer the intrinsic viscosity of polymers by detecting the outlet pressure of polymer delivery pumps. This approach only provides a range of intrinsic viscosity values and cannot accurately detect specific values. Currently, these methods cannot preemptively address the issue of polymer intrinsic viscosity fluctuations, highlighting the urgent need for a reliable polymer intrinsic viscosity prediction method.

The process flow diagram of the aggregation process, shown in Figure 1, includes stages such as terephthalic acid (TPA) feed and slurry ratio, esterification reaction, oligomer transport and additive injection, pre-condensation reaction, final polycondensation reaction, and other processes.⁴ During the polymerization process, a significant amount of sensor data is collected, but only certain process parameters affect the intrinsic viscosity of the polymer, necessitating feature extraction.⁵ Unlike other production industries, the polymerization process requires a specific residence time for the fluid to move through the pipeline. Adjusting process parameters does not immediately trigger chemical reactions; the fluid still requires a certain residence time to react upon entering the reactor, leading to a characteristic time delay in the aggregation process. The intrinsic viscosity of polymers produced during the polymerization process is not constant. To maintain long-term stability of polymer intrinsic viscosity, it is essential to achieve long-term prediction of this property. The polymerization process is typical of industrial production with a long production cycle and complex process flow. The production operation data exhibits long sequence and time series characteristics, with numerous parameters affecting intrinsic viscosity, often with coupling relationships between them. These characteristics present significant challenges for predicting intrinsic viscosity.

Figure 1.

Aggregation process flowchart.

There are two main methods for predicting the intrinsic viscosity of polymers: mechanistic modeling and data-driven modeling.^6–9 Mechanistic modeling involves analyzing physical and chemical reactions, such as reaction kinetics, energy conservation laws, and mass conservation equations in actual processes,^10–12 to construct a series of differential and algebraic equations that form a mechanistic model. The reliability and interpretability of mechanistic modeling are high, as it directly reflects the inherent relationships between process variables. However, due to the complexity of processes in the chemical industry, it is difficult to obtain accurate mechanistic equations. In existing mechanistic modeling, complex factors such as strong nonlinearity, multivariable coupling, and transient response are often simplified.^13,14 Simplified models struggle to accurately reflect the true state of complex processes. Additionally, even when mechanistic models are constructed using complex differential and algebraic equations, the challenging solving process, slow convergence speed, and low generalization significantly limit their application and promotion.^15–19 To address the limitations of mechanistic modeling in capturing complex features such as multivariable coupling, nonlinearity, and dynamics, some studies^20,21 have analyzed and extracted data characteristics in process industries to improve the performance of quality prediction. For example, Yuan et al.²² proposed a time-aware dynamic imputation and interval auxiliary network with an attention mechanism to address the heterogeneous nature of time information modeling in data sequences and the issue of missing values. Data-driven modeling does not require the analysis of complex physicochemical reactions during the aggregation process. Instead, only process parameters closely related to key quality indicators need to be selected as inputs to the model.²³ By analyzing the characteristics of the data itself, commonly used modeling methods are selected to establish a mapping based on the relationship between process variables and quality indicators. Finally, the established model is regularly trained and calibrated based on newly obtained data. Data-driven modeling can fit the relationship between process variables and quality indicators in various complex environments. Therefore, data-driven modeling methods are more suitable for predicting the intrinsic viscosity of polymers.

Although the aforementioned data-driven modeling methods effectively address complex factors such as strong coupling and dynamics in process industry data, modeling time delay and long-term prediction of intrinsic viscosity in polymerization processes remain significant challenges. Therefore, we propose a three-stage prediction model for polymer intrinsic viscosity using long sequence time series data. The main contributions of this paper can be summarized as follows:

(1) To address feature redundancy, we propose a feature extraction method based on the maximum information coefficient (MIC) and approximate Markov blanket, which effectively filters features.

(2) To handle time delay characteristics, a time delay analysis method was designed to address the delay time between various process parameters and between process parameters and polymer intrinsic viscosity during the polymerization process, achieving data feature matching.

(3) To manage long sequence time series characteristics, we innovatively employ position-sensitive hash attention in the Informer model to replace ProbSparse self-attention for predicting polymer intrinsic viscosity, thereby achieving long-term prediction of intrinsic viscosity.

The remainder of the paper is organized as follows. Section “Related work” reviews related work on time delay and long sequence time series prediction. Section “Proposed method” proposes a three-stage model for predicting polymer intrinsic viscosity using long sequence time series data. Section “Experiments” presents practical case studies and verifies the superiority of the proposed method. Section “Conclusion” discusses the research conclusions and future work.

Related work

This article focuses on predicting polymer intrinsic viscosity, considering the characteristics of time lag and long sequence time series. It proposes a time lag estimation method and a polymer intrinsic viscosity prediction method. The following sections introduce the existing related work in two areas: time delay analysis and time series prediction.

Time delay is a common issue in many fields. Existing methods primarily estimate the impact relationship of time delay based on correlation and mutual information, among other techniques. Many scholars have conducted in-depth research on this topic. Bai and Zhao²⁴ proposed a transformer-based multivariable and multi-step prediction method for detecting delays in certain chemical faults. The transformer predicts changes in the next process variable, using iterative prediction for multi-step changes in the process variable. Li et al.²⁵ proposed a new method for extracting dynamic time delay and reconstructing multivariate data to address dynamic time delay, improving the Attention-LSTM prediction model. This method adaptively adjusts the time position and span of multivariate data based on dynamic time delay to suit input/output pairing in Attention-LSTM networks. Yao²⁶ proposed a data-driven industrial quality predictor with variable time delay estimation to resolve the time delay between process variables and quality variables. Zhang et al.²⁷ used Time Delay High Correlation Analysis to calculate the time delay correlation coefficient between each pair of stations, thereby quantifying the potential time delay effect of pollutant concentration time series between each pair of stations at different time scales. The above time delay analysis methods, due to their engineering problem characteristics, can only analyze the time delay relationship between process variables and variables or between process variables and target variables, but cannot balance both. Therefore, we propose a time delay analysis method based on cross-correlation to address these shortcomings.

For the increasing demand for sequence length, the main challenge in long sequence time series prediction is to improve predictive ability, a topic extensively studied by many scholars. The Transformer model, proposed by Vaswani et al.²⁸ in 2017, uses an attention mechanism as the backbone structure, enabling parallel processing and temporal feature extraction for long sequence time series data input. Child et al.²⁹ introduced the Sparse Transformer, which incorporates self-attention through decomposition and trains hundreds of layers of dense attention networks via sparse matrix decomposition, theoretically solving the problem of predicting ultra-long sequences. Mo et al.³⁰ combined individual gate convolutional units with local context information at each time step and utilized the Transformer network encoder layer as the main prediction network, achieving good results in temporal data prediction. Despite the success of these Transformer-based models in predicting long sequence time series, their high computational complexity and low efficiency make them unsuitable for long process industrial scenarios. The following works provide better approaches for predicting the intrinsic viscosity of long sequence time series polymers. Wu et al.³¹ designed a new decomposition architecture with an autocorrelation mechanism called the autotransformer, which can gradually decompose complex long sequence time series. Experiments show that this method outperforms the Transformer in long sequence time series prediction. Kitaev et al.³² replaced the dot product attention in the Transformer with locally sensitive hash attention, achieving a sequence length of 64k, significantly longer than the common 512. However, the output prediction results are sequential, with subsequent outputs relying on previous time steps, leading to slow inference. Zhou et al.³³ proposed the Informer model for long sequence time series prediction of power transformer temperature, using a parallel generative decoder mechanism to output all prediction results for long sequence time series at once. However, this method has limitations on the length of the prediction sequence. Considering the advantages and disadvantages of the aforementioned methods, we propose a locally sensitive hash attention-Informer model based on time delay analysis (TDA-LSHA-Informer) for predicting intrinsic viscosity. This method balances time complexity, prediction sequence length, and prediction speed.

Proposed method

The polymerization process in the chemical industry exhibits three key characteristics: (1) Redundancy in characteristic quantities: to ensure production safety, numerous sensors are installed on the production line, but only the parameters from certain sensors affect the intrinsic viscosity of the polymer. (2) Time delay: adjusting process parameters in the polymerization process requires a period before a chemical reaction occurs in the subsequent process. (3) Long sequence time series prediction: the intrinsic viscosity of polymers fluctuates during production rather than remaining constant. Ensuring the long-term stability of polymer intrinsic viscosity necessitates predicting its value over a future period. Considering these characteristics, we propose a three-stage polymer intrinsic viscosity prediction model, as illustrated in Figure 2. First, the paper introduces a correlation analysis method based on feature sorting using the MIC as the metric and a redundancy analysis method based on approximate Markov blankets. Finally, the TDA-LSHA-Informer model is employed to predict the intrinsic viscosity of polymers.

Figure 2.

Schematic diagram of three-stage polymer intrinsic viscosity prediction method.

Feature selection method

To ensure the production safety of the polymerization process, numerous sensors are placed on the production line. Most of the data from these sensors are independent of the quality of polymer products. Inputting all this irrelevant redundant data into the model would negatively affect its prediction accuracy. Therefore, it is essential to perform feature filtering on the data in advance.

The MIC measures the mutual information between features, capturing both linear and nonlinear relationships, as well as non-functional relationships between two variables. The approximate Markov blanket method is an efficient approximation of the Markov blanket, significantly enhancing the speed of feature selection and search strategies while maintaining effectiveness. We propose a correlation analysis method based on feature sorting and a redundancy analysis method based on approximate Markov blankets, using the MIC as the measurement standard, and selecting features for the polyester fiber polymerization process based on the criteria of maximum correlation and minimum redundancy.

The feature correlation calculation method based on the MIC primarily involves mutual information and grid partitioning. The formula for calculating the MIC is as follows:

\begin{matrix} M I C [X; Y] = \max_{| X | | Y | < B} \frac{I (X; Y)}{l o g_{2} (\min (| X |, | Y |))} \end{matrix}

(1)

In the equation, I(X; Y) represents the mutual information values of variables X = { $x_{i}$ , i = 1, 2, . . ., n} and Y = { $y_{i}$ , i = 1, 2, . . ., n}, | X | and | Y | represent the number of values of variables X and Y divided on the grid, respectively, and B represents the upper limit value of grid division | X | *× | Y |.

I (X; Y) = \sum_{X, Y} p (X, Y) l o g_{2} \frac{p (X, Y)}{p (X) p (Y)}

(2)

In the equation, p(X, Y) represents the joint probability density of variables X and Y, and p(X) and p(Y) represent the edge probability density of variables X and Y.

In response to the redundancy characteristics of the polyester fiber polymerization process dataset, we use the approximate Markov blanket method to remove redundant features from the feature set. The conditions for the approximate Markov blanket of j are as follows:

\begin{matrix} {\begin{matrix} M I C (x_{i}, y) > M I C (x_{j}, y) \\ M I C (x_{i}, x_{j}) > M I C (x_{j}, y) \end{matrix} \end{matrix}

(3)

In the equation, MIC( $x_{i}$ , y), MIC( $x_{j}$ , y), and MIC( $x_{i}$ , $x_{j}$ ) respectively represent the maximum information coefficients of variable $x_{i}$ and label y, variable $x_{j}$ , label y, and variable $x_{i}$ and label $x_{j}$ , respectively.

Time delay analysis method

The time taken by a fluid to traverse a continuously flowing pipeline and container is called residence time. Due to factors such as residence time and fluid velocity, data measured at the same time point often reflect different logistics or media states, adversely affecting further analysis and processing. When adjusting parameters such as temperature and pressure, immediate adjustments to the expected values are not possible, requiring a certain waiting period. For instance, if the target temperature is 300°C and the current temperature is 200°C, it will take some time to reach the specified temperature. Additionally, since the polymerization process involves chemical reactions, sufficient reaction time must be allowed for the material. Consequently, after adjusting the process parameters, a waiting period is necessary before the characteristic viscosity changes. This issue must be considered in data modeling. It is essential to calculate the delay time between each parameter, align them, and input them into the prediction model.

Therefore, we propose a method for estimating time delay to measure the delay between various process parameters and intrinsic viscosity. It is important to note that the time delay considered in this study is caused by process dwell time, which differs from delays caused by measurement components, measurement processes, control components, or execution components in control systems.

Assuming there are m types of sensor data X = [ $X_{1}, X_{2}, \dots, X_{m}]$ , with a sensor sampling period of T = [ $T_{1}, T_{2}, \dots, T_{m}]$ , If $X_{m}$ is the intrinsic viscosity value, $X_{1}$ is the pre condensation temperature, and the intrinsic viscosity data obtained from on-site sampling and the pre condensation temperature are $X_{m} {(t)}_{t = 0 \dots N}$ and $X_{1} {(t)}_{t = 0 \dots N}$ , where $t$ is the sampling time and N is the time length, the delay time of $X_{m}$ relative to $X_{1}$ can be calculated by formula 1

\begin{matrix} τ_{m} = \arg m a x (| C_{X_{1} X_{m}} (l) |) * T_{m} \end{matrix}

(4)

where | | represents the absolute value symbol $\arg \max$ () represents taking the maximum value of the interval.

\begin{matrix} C_{x_{1} x_{m}} (l) = \frac{R_{X_{1} X_{m}} (l)}{\sqrt{R_{X_{1}} R_{X_{m}} (l)}} \end{matrix}

(5)

\begin{matrix} R_{x_{1} x_{m}} (l) = E {(X_{1} (t) - u_{x_{1}} (t)) * (X_{m} (t + l) - u_{x_{m}} (t + l))} \end{matrix}

(6)

\begin{matrix} R x_{1} = E {{(X_{1} (t) - u_{x_{1}} (t))}^{2}} \end{matrix}

(7)

\begin{matrix} R x_{m} = E {{(X_{i} (t + l) - u_{x_{i}} (t + l))}^{2}} \end{matrix}

(8)

where $u_{x_{1}} (t)$ is the median of the range of intrinsic viscosity $X_{1} (0)$ to $X_{1} (t)$ , $u_{x_{i}} (t + l)$ is the median of the range of pre-condensation temperature $X_{i} (0)$ to $X_{i} (t + l)$ .

Long sequence time-series predicting method

The current mainstream approach to predicting the intrinsic viscosity of polymers involves using time series prediction with extensive production history data. However, most existing methods focus on short-term sequence prediction. To ensure the stability of polymer intrinsic viscosity, it is necessary to predict long-term intrinsic viscosity values.

Therefore, we propose a long sequence series prediction method based on the Informer framework, as shown in Figure 3. The model consists of an encoder and a decoder, incorporating Multi-head Self-Attention, Self-Attention Distilling, and Locality Sensitive Hashing Attention.

Figure 3.

TDA-LSHA-informer model structure.

Positional encoding

Assuming the input of the t-th sequence is $X^{t}$ and consists of P types of global timestamps, and the input dimension of the model is $d_{m o d e l}$ , we first use a fixed position encoding (PE) to represent local position information, namely:

\begin{matrix} P E_{(p o s, 2 j)} = \sin (\frac{p o s}{{(2 L_{x})}^{\frac{2 j}{d_{m o d e l}}}}) \end{matrix}

(9)

\begin{matrix} P E_{(p o s, 2 j + 1)} = \sin (\frac{p o s}{{(2 L_{x})}^{\frac{2 j}{d_{m o d e l}}}}) \end{matrix}

(10)

where $j \in {1, \dots, d_{m o d e l} / 2}$ .

Each type of global timestamp is assigned a learnable timestamp embedding $S E_{(p o s)}$ , and each embedding has mean intrinsic viscosity (up to 60, which is the smallest granularity in minutes). In addition, to align the dimensions, we use one-dimensional convolution (with a kernel width of 3 and a step size of 1) to map $x_{i}^{t}$ to the $u_{t}^{i}$ of the $d_{m o d e l}$ dimension. The final sequence input into the model is:

\begin{matrix} X_{f e e d [i]}^{t} = α u_{i}^{t} + P E_{(L_{x} \times (t - 1) + i)} + \sum_{p} [S E_{(L_{x} \times (t - 1) + i)}] p \end{matrix}

(11)

where $i \in {1, \dots, L_{x}}$ , $α$ is a hyperparameter, which is used to balance $u_{i}^{t}$ , embedding with location encoding and timestamp. If the input has been standardized, it is recommended that $α = 1$ .

Multi-head self-attention

Multi Head Attention is composed of multiple Scaled Dot-Product Attention. Scaled Dot Product Attention can be described as mapping a query and a set of key value pairs to an output, where the query, key, value, and output are all vectors. The input consists of queries and keys of dimension $d_{k}$ , and values of dimension $d_{v}$ .

In practice, we compute the attention function on a set of queries simultaneously, packed together into a matrix Q. The keys and values are also packed together into matrices K and V. We compute the matrix of outputs as:

\begin{matrix} A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V \end{matrix}

(12)

Locality sensitive hashing Attention

For Locality sensitive hashing Attention, we refer to the structure of the multi head attention mechanism and focus on the calculation of formula 12. Assuming that the shape [batch_size, length, $d_{m o d e l}$ ] is applied to Q, K, V, according to Formula 12, as it is a constant, the shape of Q $K^{T}$ [batch_size, length, length] will become a major problem. The approximate calculation rate of Softmax is determined by Q $K^{T}$ , so in each query, we only need to look for the key of $q_{i}$ near K. For example, when the length of K is 64K, we only need to look for 32 or 64 closest keys. This method is highly efficient, but it needs to solve the problem of quickly finding adjacent neighbors in high-dimensional space.

We employ a random projection method to address this problem. When adjacent vectors have a high probability of obtaining the same hash while distant vectors do not, the hashing scheme that assigns each vector x to hash h (x) is termed local sensitivity. Our goal is for nearby vectors to obtain the same hash with a high probability, ensuring that hash buckets have a similar high probability. As shown in Figure 4, angle-sensitive hashing uses the random rotation of spherical projection points to establish buckets on the signed axis projection using argmax. In the upper half of the figure, the spherical projection distance between points x and y is relatively large, and they will not share the same hash bucket at three angles unless the spherical projection distance is close.

Figure 4.

Sphere projected point with random rotation.

Assume that the size of Random matrix R is [ $d_{k}$ , b/2], define h(x) = argmax ([xR; −xR]), where [u; v] represents the connection of two vectors. Through this known method, b hash values can be obtained. Based on the above content, we will rewrite the attention equation as:

\begin{matrix} o_{i} = \sum_{j \in P_{i}} \exp (q_{i} k_{j} - z (i, P_{i})) v_{j} \end{matrix}

(13)

\begin{matrix} P_{i} = {j : i \geq j} \end{matrix}

(14)

$P_{i}$ represents the set of interest for the query at position i, and z is the normalization term for softmax. The equation also omits $\sqrt{d_{k}}$ for clarity.

In order to achieve batch processing, we usually focus on larger sets P $\tilde{P_{i}} = {0, 1, \dots, l} \supseteq P_{i}$ while masking out elements not in $P_{i}$ :

\begin{matrix} o_{i} = \sum_{j \in \tilde{P_{i}}} \exp (q_{i} \cdot k_{j} - m (j, P_{i}) - z (i, P_{i})) v_{j} \end{matrix}

(15)

\begin{matrix} m (j, P_{i}) = {\begin{matrix} \infty i f j \notin P_{i} \\ 0 o t h e r w i s e \end{matrix} \end{matrix}

(16)

For Locality Sensitive Hashing (LSH) attention, we can consider limiting the set Pi of target items that query location i can focus on by allowing attention only within a single hash bucket.

\begin{matrix} P_{i} = {j : h (q_{i}) = h (k_{j})} \end{matrix}

(17)

Figure 5(1) and (2) on the right side illustrate a brief comparison between complete attention and hash variants. In (1), the attention matrix used for complete attention is typically sparse, but this sparsity is not considered during computation. In (2), the sorting order of queries and keys is determined by the hash bucket. Because similar items are likely to fall into the same bucket, we can approximate the full attention pattern by allowing only attention within each bucket.

Figure 5.

Schematic comparison of full-attention with a hashed variant.

There are two problems with directly splitting hash buckets: the size of hash buckets in this formula is often uneven, which makes batch processing across buckets difficult. In addition, the number of queries and keys in a bucket may not be equal. In fact, a bucket may contain many queries but no keys. To solve these problems, our solution is to set $k_{j} = \frac{q_{j}}{‖ q_{j} ‖}$ to ensure that $h (k_{j}) = h (q_{j})$ . Then, arrange the query in order based on the bucket number and sort it by sequence position within each bucket. This defines a permutation where i → $s_{i}$ after sorting. As shown in Figure 5(3), in the sorted attention matrix, pairs from the same bucket converge near the diagonal. In Figure 5(4), we can use a batch processing method where m consecutive query blocks (sorted) attend to each other, and one block returns. This corresponds to setting:

\begin{matrix} {\tilde{P}}_{i} = {j : \frac{s_{i}}{m} - 1 \leq \frac{s_{j}}{m} \leq \frac{s_{i}}{m}} \end{matrix}

(18)

If $m > m a x | P_{i} |$ , then ${\tilde{P}}_{i} \supseteq P_{i}$ . In this paper, we set m = $\frac{2 l}{n_{b u c k e t s}}$ (where l is the sequence length). The average bucket size is $\frac{l}{n_{b u c k e t s}}$ . Suppose the probability of the bucket growing to twice its size is low enough. The entire process of LSH attention is summarized in Figure 4.

Self-attention distilling

According to the LSH Attention process, redundancy is inevitably generated in the Attention results. The Informer network addresses this by shortening the original sequence length through a self-attention distillation operation, which highlights the main attention and extracts dominant features using maximum pooling. Traditional distillation structures do not incorporate a self-attention mechanism, so they simply shorten the sequence length through distillation operations, potentially leading to the loss of important information. The attention distillation structure used in this paper includes an attention mechanism, enabling the network to focus on important information in the sequence while shortening the original sequence length. After the self-attention distillation operation, a more focused self-attention feature map is generated within a smaller range. The self-attention distillation operation from layer f to layer f + 1 is calculated as:

\begin{matrix} X_{f + 1}^{t} = M a x P o o l (E L U (C o n v 1 d ({[X_{f}^{t}]}_{A B}))) \end{matrix}

(19)

In the formula, ${[X_{f}^{t}]}_{A B}$ represents LSH Attention calculation, $M a x P o o l$ represents average pooling. $C o n v 1 d$ represents performing one-dimensional convolution on the time dimension, activating it through the exponential linear units (ELU) function, and then performing maximum pooling downsampling through Maxpool.

Generative style decoding

The decoder consists of a Masked LSH Attention layer and a Multi-Head Self-Attention layer. The LSH Attention needs to be masked to avoid leftward information flow and prevent autoregression. Additionally, a fully connected layer is required, with its output dimensions depending on the dimensions of the variables to be predicted. We adopt a generative inference process to improve the speed of inference. Specifically, the input to the decoder is:

\begin{matrix} X_{f e e d_{d e}}^{t} = C o n c a t (X_{t o k e n}^{t}, X_{0}^{t}) \in R^{(L_{t o k e n} + L_{y}) \times d_{m o d e l}} \end{matrix}

(20)

where $X_{t o k e n}^{t} \in R^{L_{t o k e n} \times d_{m o d e l}}$ is start token, $X_{0}^{t} \in R^{L_{y} \times d_{m o d e l}}$ is a placeholder reserved for the prediction sequence (set the scalar value to 0). The start token is an effective technology in the dynamic decoding process of Natural language processing. Generally, the start token is set as a special delay, while the start token in Informer is a sequence, which is intercepted from the input of encoder. For example, to predict a 10 h sequence, the last 8 h of the input sequence can be truncated as the start token. $X_{0}^{t}$ contains the timestamp embedding of the predicted sequence. The decoding process of the entire decoder abandons the dynamic decoding process and adopts a single forward process to decode the entire output sequence.

Experiments

In this section, we conducted three experiments to verify the performance of the proposed method. First, we preprocessed the data. Then, we conducted feature selection experiments. Subsequently, we performed time delay experiments on certain polymerization processes. Finally, we compared the polymer intrinsic viscosity prediction method proposed in this paper with other methods. The experimental data in this study are sourced from a polyester fiber factory, collected between March and September 2019. The dataset includes 1950 sensor readings from polyester fiber polymerization processes, encompassing parameters such as slurry density, pre-condensation pressure difference, and pre-condensation output temperature. The specific details are shown in Table 1. These parameters were selected based on process knowledge and engineering experience. For clarity, the necessary abbreviations and symbols in the table are explained as follows: EG – ethylene glycol, TPA – terephthalic acid, HW – heat exchanger plate, TiO₂ – titanium dioxide, DEG – diethylene glycol, UFPP – pre-shrinkage polymerization, FIN – final shrinkage polymerization.

Table 1.

Process parameters.

Serial number	Sensor description	Serial number	Sensor description
1	Pressure of TPA mixing tank	2	TPA slurry preparation tank speed
3	Slurry mix tank current agitator	4	Liquid level control of slurry preparation tank
5	Slurry density	6	Flow rate from EG to slurry preparation tank
7	Total circulation tank liquid level	8	EST flows to EG slurry mixing tank
9	Added ethylene glycol flow rate	10	Flow from slurry preparation tank to feeding tank
11	Injection pressure of EG sizing agent	12	Injection pressure of TPA sizing agent
13	Liquid level of slurry feed tank	14	Flow rate of slurry A to esterification tank
15	Flow rate of slurry B to esterification tank	16	Esterification tank liquid level
17	Esterification tank temperature	18	Supply pressure
19	Lead supply pressure	20	Front end temperature of injection sizing agent
21	Temperature after spraying adhesive	22	Esterification tank pressure A
23	Esterification tank pressure B	24	Siphon pressure
25	Temperature of first plate	26	Flow of the SC reflux liquid
27	Esterification tank HW liquid level	28	Esterification tank HW temperature
29	Esterified round water temperature	30	Desalination water flow rate
31	Outlet desalination water temperature	32	Oligomer flow rate
33	Low polymer pipeline pressure	34	Low polymer pump speed
35	Outlet pressure of pump oligomer A	36	Outlet pressure of pump oligomer B
37	Low polymer thermal conductivity heating pressure	38	Catalyst level
39	Catalyst spraying to tank temperature	40	Catalyst injection pressure
41	Oligomer flow rate	42	Catalyst//oligomer flow rate
43	TiO₂ liquid level	44	TiO₂//oligomer flow
45	DEG tank level	46	DEG/oligomer percentage
47	DEG flow rate	48	DEG injection pressure
49	Stable flow rate	50	Stable diethylene glycol//oligomer flow rate
51	Liquid level in the pre-shrinkage polymerization 16th layer plate	52	Outlet temperature of UFPP
53	UFPP pilot supply pressure	54	Outlet temperature of UFPP 16th
55	Pilot pressure of UFPP 16th	56	UFPP pressure
57	UFPP differential pressure	58	Pilot pressure of UFPP
59	Spray temperature of UFPP EG	60	Level of UFPP HW
61	Inlet liquid level of FIN	62	Outlet liquid level of FIN
63	Outlet temperature of FIN	64	Pilot pressure of FIN
65	FIN pressure	66	Agitator rotational velocity of FIN

Data preprocessing

Unlike traditional commercial data, production data in process industries often contain inevitable measurement errors. Additionally, this data exhibits strong temporal characteristics, high inter-variable coupling, and temporal fluctuations. These characteristics can adversely affect the accuracy of time delay calculations. Therefore, it is crucial to preprocess the data before conducting time delay analysis.

First, significant errors in the data need to be eliminated. Significant errors, also known as gross errors, are primarily caused by factors such as process leaks, instrument failures, unstable operations, and the psychological and physiological limitations of the measurement personnel. We adopt the following two steps to eliminate significant errors from the sample data.

(1) According to process requirements and operating experience, the operating range of the original data variables is determined by analyzing the sampling data. Then, the maximum and minimum limiting method is applied to eliminate data that fall outside this range (predefined threshold).

(2) The Leda criterion is used to detect and eliminate data with significant errors. The mathematical method is as follows:

\begin{matrix} σ = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \end{matrix}

(21)

\begin{matrix} v_{i} = x_{i} - \bar{x} \end{matrix}

(22)

if the deviation $v_{i}$ of a certain monitoring data $x_{i}$ is greater than 3, it is considered that the data has a significant error and should be removed.

Since time-delay calculation requires continuously spaced sample data, and the data involved is often discontinuous due to omitted sampling times and error elimination, it is necessary to interpolate and fill the missing data before performing time-delay calculations. In this paper, a simple mean value method is adopted, where the average value of the sample data is used to fill in the gaps. This method is straightforward and easy to implement.

Feature selection experiment

We obtained approximately 540,000 data samples from the sensor at a sampling interval of 30 s. Out of these, 432,000 samples were selected to form the training dataset, with the mean intrinsic viscosity as the output variable. The remaining samples were used to create the test dataset. When the adjustment step size is 0.1, the optimal learning rate is 0.002. Because there is a conflict between the training batch size and convergence, that is to say, if a sufficiently large batch of data is selected for training each time, the convergence of the model will be poor. Finally, the learning rate was set to 0.002, the batch size to 64, the number of epochs to 100, the optimizer was Adam, and the stride of the max pooling layer was set to 2.

The results are shown in Figure 6, where the x-axis represents the F-score, and the y-axis represents 1950 different features. By selecting multiple thresholds for testing and using Mean Square Error (MSE) as an evaluation indicator, features can be selected from the ranking of feature correlations. The threshold is calculated as the feature correlation of the current feature score divided by the sum of all 60 feature scores. For example, if the number of selected features is 10, the threshold is equal to the score of the top 10 features in the feature importance ranking divided by the total score of all 60 features. As shown in Table 2, due to space limitations, only a small number of features and corresponding thresholds are listed. Finally, among the 60 features, the approximate Markov blanket feature selection method selects the top 30 features most relevant to intrinsic viscosity as the final input variables of the model.

Figure 6.

Importance ranking of process parameters.

Table 2.

The statistical MSE results of Testing set on different feature numbers.

Feature number	Threshold	MSE
10	0.024	0.00296
20	0.020	0.00299
30	0.016	0.00287
40	0.0135	0.00289
50	0.090	0.00294
60	0.005	0.00288

Time delay analysis experiment

To verify the effectiveness of the time delay analysis method, time delay analysis experiments were conducted specifically for linear pipelines and bifurcated pipelines. Using the esterification reaction stage depicted in Figure 7 as the experimental object, the 10 variables involved are listed in Table 3. These include the inlet temperature of the esterification reactor, the flow rate of the esterification reactor, the pressure of the esterification reactor, the siphon pressure, and the outlet pressure of the oligomer. Full-day production data from March 1, 2019, was selected for the calculation, with a sampling period of 2 min. The time window size was set to 2 h based on the polymerization degree of the esterification reaction tank. The experimental calculation results are shown in Table 3.

Figure 7.

Esterification reaction stage.

Table 3.

Tag number description.

Number	Tag number	Variable meaning
1	TI53115	Esterification reactor inlet temperature
2	TI53116A	Esterification reactor flow rate
3	TI53117P	Esterification reactor pressure
4	TI53118A	Siphon pressure
5	TI53120A	Oligomer outlet pressure
6	TIC53210	The initial water temperature,
7	TIC53211	Top temperature of the distillation kettle
8	TIC53212	Pressure of the distillation kettle,
9	TIC53213	Middle reflux extraction temperature
10	TIC53214	Bottom temperature of the distillation kettle

Time delay analysis experiment of linear pipeline

Using the esterification reactor depicted in Figure 7 as the calculation object, five variables were selected: the inlet temperature of the esterification reactor (240°C), the flow rate of the esterification reactor, the pressure of the esterification reactor, the siphon pressure, and the outlet pressure of the oligomer. Full-day production data from March 1, 2019, was selected for the calculation, with a sampling period of 2 min. Part of the data is shown in Table 4. Using the esterification feed temperature TI53115 as the reference tag, the time window size was set to 2 h. The calculation results are presented in Table 5.

Table 4.

Esterification reactor process parameters.

Tag number	Sample data
Tag number	1	2	. . .	19	20
TI53115	240.12	243.68	. . .	250.21	255.88
TI53116A	15.32	15.55	. . .	15.53	15.96
TI53117P	230.12	232.88	. . .	238.45	241.66
TI53118A	50.22	49.75	. . .	53.96	55.13
TI53120A	1.32	1.44	. . .	1.52	1.67

Table 5.

Esterification reactor process parameter delay time.

Tag number	Simulation result (min)	Time delay analysis method (min)
TI53115	0 (referential)	0 (referential)
TI53116A	8	8
TI53117P	15	16
TI53118A	24	21
TI53120A	30	34

Time delay analysis experiment of forked pipeline

Using the distilled water reactor shown in the figure as the experimental object, we selected the initial water temperature, top temperature of the distillation kettle, pressure of the distillation kettle, middle reflux extraction temperature, and bottom temperature of the distillation kettle as calculation variables. Production data for the entire day on March 1, 2019, was chosen for the calculation, with a sampling period of 2 min. Part of the data is shown in Table 6. Using the esterification feed temperature TIC53210 as the reference tag, the time window size was set to 2 h. The calculation results are presented in Table 7.

Table 6.

Distilled water reactor process parameters.

Tag number	Sample data
Tag number	1	2	. . .	19	20
TIC53210	36.15	37.85	. . .	40.21	41.6
TIC53211	75.35	76.32	. . .	80.36	81.23
TIC53212	128.63	129.35	. . .	136.46	140.36
TIC53213	230.36	231.35	. . .	329.36	331.45
TIC53214	80.32	80.98	. . .	85.26	86.33

Table 7.

Distilled water reactor process parameter delay time.

Tag number	Simulation result (min)	Time delay analysis method (min)
TIC53210	0 (referential)	0 (referential)
TIC53211	6	7
TIC53212	22	19
TIC53213	24	22
TIC53214	10	11

Experiment on predicting intrinsic viscosity

In the training of the intrinsic viscosity prediction model, the pooling layer used the max pooling method, the learning rate was set to 0.002, the batch size to 32, the number of epochs to 160, and the optimizer was Adam. In the distilled attention module, we designed a one-dimensional convolution and a max pooling layer with a stride of 2, gradually reducing the length of the sequence data through the same operation to suppress the number of locally sensitive hash attention distillation layers.

To demonstrate the effectiveness of our proposed prediction method, we conducted two sets of comparative experiments. The first set evaluated the performance of Transformer, PIST-CVAE,³⁴ TS-λ GRU,³⁵ and TDA-LSHA-Informer. Table 8 and Figures 8, 9, 10 illustrate that with a sequence length of 100, all methods exhibited relatively low Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), with TDA-LSHA-Informer achieving the lowest MAE of 0.0160 and RMSE of 0.0257. However, as the sequence length increased to 200 and 300, the MAE and RMSE values for each method also increased, indicating reduced prediction accuracy. Nevertheless, TDA-LSHA-Informer consistently outperformed the other models. Specifically, at a sequence length of 200, TDA-LSHA-Informer yielded MAE and RMSE values of 0.0365 and 0.0540, respectively, while at 300, these values were 0.0484 and 0.0619. These findings demonstrate the robust performance of TDA-LSHA-Informer across varying sequence lengths.

Table 8.

Prediction of polymer intrinsic viscosity with different length.

Methods	Length:100		Length:200		Length:300
Methods	MAE	RMSE	MAE	RMSE	MAE	RMSE
Transformer	0.0571	0.0694	0.0915	0.1138	0.1403	0.1752
PIST-CVAE	0.0449	0.0558	0.0733	0.0906	0.1252	0.1531
TS-λGRU	0.0301	0.0322	0.0441	0.0618	0.0517	0.0690
TDA-LSHA-Informer	0.0160	0.0257	0.0365	0.0540	0.0484	0.0619

Figure 8.

Length100: polymer intrinsic viscosity prediction.

Figure 9.

Length200: polymer intrinsic viscosity prediction.

Figure 10.

Length300: polymer intrinsic viscosity prediction.

Next, we evaluate the effectiveness of time delay analysis. The LSHA-Informer represents our proposed method without incorporating time delay analysis. Experimental results indicate that when predicting sequences of equal length, the TDA-LSHA-Informer, which considers time delay, achieves superior prediction accuracy compared to the LSHA-Informer without this consideration (Figure 11).

Figure 11.

Comparative experiment between TDA-LSHA-Informer and LSHA-Informer.

Conclusion

In this paper, we propose a three-stage model for predicting polymer intrinsic viscosity over long time series data. This method effectively addresses the challenge of predicting viscosity over extended sequences while accounting for time delays. Initially, to handle the high dimensionality of process parameters in polymerization, we employ the MIC to extract features. This is achieved through a redundancy analysis method based on approximate Markov blankets. Next, acknowledging the time delays inherent in process industries, we introduce a method for time delay analysis. This approach computes the delay times between different process parameters, aligning closely with simulation results. Lastly, we introduce the TDA-LSHA-Informer model, an innovative framework for polymer viscosity prediction. Experimental results demonstrate the efficacy of our feature selection method in screening relevant process parameters. Our proposed time delay analysis method proves effective, particularly when integrated with the enhanced capabilities of the Informer model, yielding superior performance in long sequence prediction tasks while accommodating time delays.

However, our method relies heavily on high-quality data, and practical challenges such as missing data can impact prediction accuracy. Moreover, the complexity of our model, leveraging multi-head self-attention and locally sensitive hash attention, presents challenges in interpretability. Future research efforts will focus on addressing these limitations.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Program of the National Natural Science Foundation of China under Grant No. 52005099, the Open Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment, Zhengzhou University of Light Industry under Grant No. IM202303, and Shanghai Science and Technology Innovation Action Plan under Grant No. 22511101903.

ORCID iD

Jinmao Bi

References

Shi

Gao

Chen

, et al. Online prediction based on the Copula function for the intrinsic viscosity of polyester fiber. Can J Chem Eng 2023; 101(3): 1440–1454.

del Hierro

Boyron

Ortín

. Fully automated instrument for solution viscosity in polymeric materials. Macromol Symp 2022; 406(1): 2200018.

Mylsamy

Krishnasamy

. A review on electrical properties of fiber-reinforced polymer material: fabrication, measurement, and performances. Trans Indian Inst Metals 2023; 76(7): 1691–1708.

Peng

Chen

Hao

. Deep transfer model with source domain segmentation for polyester esterification processes. In: 2022 34th Chinese control and decision conference (CCDC), Hefei, China, 15–17 August 2022, pp.293–298. New York: IEEE.

Ren

Hao

. Modeling of aggregation process based on feature selection extreme learning machine of atomic search algorithm. In: 2021 IEEE 10th data driven control and learning systems conference (DDCLS), Suzhou, China, 14–16 May 2021, pp.1453–1458. New York: IEEE.

Keshavarz

Shafiee

Jazi

. A simple correlation for reliable prediction of intrinsic viscosity (limiting viscosity number) of different polymer-solvent combinations. Fluid Phase Equilib 2022; 557: 113422.

Barton

Himmelblau

. Online prediction of polymer product quality in an industrial reactor using recurrent neural networks. In: Proceedings of international conference on neural networks (ICNN’97), Houston, TX, USA, 12 June 1997, vol. 1, pp.111–114. New York: IEEE.

Gao

Chen

Hao

, et al. Copula-based JITL for polyester fiber polymerization process modeling. In: 2020 Chinese automation congress (CAC), Shanghai, China, 6–8 November 2020, pp.2628–2633. New York: IEEE.

Zhu

Hao

Xie

, et al. Soft sensor based on eXtreme gradient boosting and bidirectional converted gates long short-term memory self-attention network. Neurocomputing 2021; 434: 126–136.

10.

Feng

, et al. Intensification of polymerization processes by reactive extrusion. Ind Eng Chem Res 2021; 60(7): 2791–2806.

11.

Chen

Shao

, et al. Process intensification of polymerization processes with embedded molecular weight distributions models: an advanced optimization approach. Ind Eng Chem Res 2018; 58(15): 6133–6145.

12.

Chang

Feng

, et al. In situ raman spectroscopy real-time monitoring of a polyester polymerization process for subsequent process optimization and control. Ind Eng Chem Res 2022; 61(49): 17993–18003.

13.

Laubriet

LeCorre

Choi

. Two-phase model for continuous final stage melt polycondensation of poly (ethylene terephthalate). 1. Steady-state analysis. Ind Eng Chem Res 1991; 30(1): 2–12.

14.

Castres Saint Martin

Choi

. Two-phase model for continuous final-stage melt polycondensation of poly (ethylene terephthalate). 2. Analysis of dynamic behavior. Ind Eng Chem Res 1991; 30(8): 1712–1718.

15.

Ravindranath

Mashelkar

. Modeling of poly (ethylene terephthalate) reactors. I. A semibatch ester interchange reactor. J Appl Polym Sci 1981; 26(10): 3179–3204.

16.

Ravindranath

Mashelkar

. Modeling of poly (ethylene terephthalate) reactors. II. A continuous transesterification process. J Appl Polym Sci 1982; 27(2): 471–487.

17.

Ravindranath

Mashelkar

. Modelling of poly (ethylene terephthalate) reactors. III. A semibatch prepolymerization process. J Appl Polym Sci 1982; 27(7): 2625–2652.

18.

Ravindranath

Mashelkar

. Modeling of poly (ethylene terephthalate) reactors: 4. A continuous esterification process. Polym Eng Sci 1982; 22(10): 610–618.

19.

Ravindranath

Mashelkar

. Modeling of poly (ethylene terephthalate) reactors: 5. A continuous prepolymerization process. Polym Eng Sci 1982; 22(10): 619–627.

20.

Yuan

Wang

, et al. Variable correlation analysis-based convolutional neural network for far topological feature extraction and industrial predictive modeling. IEEE Trans Instrum Meas 2024; 73: 1–10.

21.

Yuan

Huang

, et al. Quality prediction modeling for industrial processes using multiscale attention-based convolutional neural network. IEEE Trans Cybern 2024; 54: 2696–2707.

22.

Yuan

, et al. Attention-based interval aided networks for data modeling of heterogeneous sampling sequences with missing values in process industry. IEEE Trans Industr Inform 2024; 20(4): 5253–5262.

23.

Zhang

, et al. Data-driven prediction of polymer intrinsic viscosity with incomplete time series data. In: 2023 28th international conference on automation and computing (ICAC), Birmingham, UK, 30 August–1 September 2023, pp.1–6. New York: IEEE.

24.

Bai

Zhao

. A novel transformer-based multi-variable multi-step prediction method for chemical process fault prognosis. Process Saf Environ Prot 2023; 169: 937–947.

25.

Yang

, et al. DTDR–ALSTM: extracting dynamic time-delays to reconstruct multivariate data for improving attention-based LSTM industrial time series prediction models. Knowl Based Syst 2021; 211: 106508.

26.

Yao

. Cooperative deep dynamic feature extraction and variable time-delay estimation for industrial quality prediction. IEEE Trans Industr Inform 2020; 17(6): 3782–3792.

27.

Zhang

Wang

Shen

, et al. Multiscale time-lagged correlation networks for detecting air pollution interaction. Physica A 2022; 602: 127627.

28.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inf Process Syst 2017; 30:5998–6008.

29.

Child

Gray

Radford

, et al. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.

30.

, et al. Remaining useful life estimation via transformer encoder enhanced by a gated convolutional unit. J Intell Manuf 2021; 32: 1997–2006.

31.

Wang

, et al. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 2021; 34: 22419–22430.

32.

Kitaev

Kaiser

Levskaya

. Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451, 2020.

33.

Zhou

Zhang

Peng

, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell 2021; 35(12): 11106–11115.

34.

Xie

Hao

Huang

, et al. Data-driven modeling based on two-stream λ gated recurrent unit network with soft sensor application. IEEE Trans Ind Electron 2019; 67(8): 7034–7043.

35.

Zhu

Damarla

Hao

, et al. Parallel interaction spatiotemporal constrained variational autoencoder for soft sensor modeling. IEEE Trans Industr Inform 2021; 18(8): 5190–5198.

Data-driven three-stage polymer intrinsic viscosity prediction model with long sequence time series data

Abstract

Keywords

Introduction

Related work

Proposed method

Feature selection method

Time delay analysis method

Long sequence time-series predicting method

Positional encoding

Multi-head self-attention

Locality sensitive hashing Attention

Self-attention distilling

Generative style decoding

Experiments

Data preprocessing

Feature selection experiment

Time delay analysis experiment

Time delay analysis experiment of linear pipeline

Time delay analysis experiment of forked pipeline

Experiment on predicting intrinsic viscosity

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References