Abstract
Corrosion is a major factor affecting the long-term safety and stable operation of refining units. In the process of corrosion risk management, mechanism-based models or data-driven models are commonly used for corrosion rate prediction. However, single mechanism-based models are often difficult to construct and exhibit low predictive accuracy, while pure data-driven models inherently suffer from issues such as poor interpretability and limited extrapolation. This paper proposes a method of data-augmented data-mechanism fusion modelling to develop a corrosion rate prediction model for the low-temperature corrosion mechanisms of HCl + H2S + H2O systems based on multi-source data from a distillation column in the crude distillation unit (CDU). First, the corrosion mechanisms at the overhead system of the distillation column, a critical corrosion-prone area, are analysed. Semi-empirical and chemical kinetic models are constructed based on key influencing factors. Subsequently, the isolation forest (iF) algorithm is employed to pre-process the original dataset. The semi-empirical model is used to extend the dataset, and missing data points are supplemented using the k-nearest neighbour (KNN) model. Furthermore, a Generative Adversarial Network (GAN) is utilised to fuse and augment the data generated by the mechanism-based model with field data. Finally, a random forest (RF) model is developed for corrosion rate prediction based on the data-augmented data-mechanism fusion framework. The model achieves an root mean squared error (RMSE) of 0.00598, an mean absolute error (MAE) of 0.00419, and an R² of .802 on the augmented dataset. Through the conducted uncertainty analysis, sensitivity analysis, and comparative evaluation of the predictive performance of multiple models, the proposed data-augmented RF model demonstrates the highest accuracy and stronger generalisation ability under abnormal conditions. This effectively compensates for the limitations of single models in terms of predictive accuracy, extrapolation, and interpretability under real-world conditions, and can provide valuable guidance for condition-based maintenance and risk identification of equipment.
Get full access to this article
View all access options for this article.
