Abstract
Facial Expression Recognition (FER) has become more crucial in intelligent human-computer contact systems in recent years. The complexity and confusion of target emotions lead to low accuracy of FER. This study proposes an attentional residual network based spatial transformer mechanism for FER to establish an effective emotion recognition model. First, the learnable module Spatial Transformer Network (STN) was introduced to actively transform the feature map in order to learn the feature map's more general distortion invariance. Second, the parameters and structure of ResNet18 were adjusted, and it was connected to the STN in an end-to-end manner. Finally, by introducing the Squeeze and Excitation (SE) block, which replaces the ReLu function with the Mish function, we improved the stability and precision of the channel weight adjustment. The verification was carried out on three public datasets FER2013, CK+, and JAFFE. Among them, the FER2013 dataset is separated into three sections, training set, public validation set, and private validation set. The ten-fold cross-validation approach was used for the sparse CK + and JAFFE datasets. On the FER2013, CK+, and JAFFE datasets, accuracy rates of 73.25%, 99.18%, and 97.10% were attained, respectively.
Get full access to this article
View all access options for this article.
