Abstract
Imbalanced data distribution causes the traditional machine learning classification algorithms to be affected by the characteristics of the majority class, resulting in poor classification performance for the minority-class data. To improve the classification accuracy of minority classes in imbalanced data, this study has proposed a novel model—a generative adversarial network with self-attention mechanism oversampling based on a convolutional neural network (GAN-SAMO-CNN). The self-attention mechanism (SAM) of this model focused on the correlations among data elements of the minority class. The degree of correlation was first obtained by calculating the attention scores, which enabled the effective extraction of the distribution characteristics of the data. Subsequently, a generative adversarial network (GAN) was used to generate samples with high similarity to reduce data imbalances. Finally, a CNN classification model was constructed to train and predict the samples. The experimental results showed that the F1-score, G-mean, and area under PRC curve (AUPRC) of the model were considerably better than those of the other imbalanced data classification methods. The proposed method was then validated using multiple independent test datasets to demonstrate the model's generalizability and robustness.
Get full access to this article
View all access options for this article.
