Abstract
This article proffers a novel procedure for missing value imputation, combining correlation analysis and Gaussian mixture model (GMM). Firstly, the normality of the data is assessed using normality assessment algorithm, and then the appropriate correlation coefficient calculation approach is selected owing to the normality of the data. Subsequently, the original correlation matrix is transformed into a binarized matrix based on a chosen threshold, which is used to group variables into different categories according to the correlation among them. Different missing value imputation methods are applied to these categories: mean imputation for single-variable groups and GMM-driven imputation for multi-variable groups. For multi-variable groups, a GMM model is trained using the Figueiredo-Jain algorithm, after which missing values are imputed using the mean derived from the model. Ultimately, the experimental evidence from Tennessee Eastman process and gold hydrometallurgy process further verify the superiority of the proposed algorithm.
Keywords
Get full access to this article
View all access options for this article.
