Sage Journals: Discover world-class research

Abstract

Facial Expression Recognition (FER) has become more crucial in intelligent human-computer contact systems in recent years. The complexity and confusion of target emotions lead to low accuracy of FER. This study proposes an attentional residual network based spatial transformer mechanism for FER to establish an effective emotion recognition model. First, the learnable module Spatial Transformer Network (STN) was introduced to actively transform the feature map in order to learn the feature map's more general distortion invariance. Second, the parameters and structure of ResNet18 were adjusted, and it was connected to the STN in an end-to-end manner. Finally, by introducing the Squeeze and Excitation (SE) block, which replaces the ReLu function with the Mish function, we improved the stability and precision of the channel weight adjustment. The verification was carried out on three public datasets FER2013, CK+, and JAFFE. Among them, the FER2013 dataset is separated into three sections, training set, public validation set, and private validation set. The ten-fold cross-validation approach was used for the sparse CK + and JAFFE datasets. On the FER2013, CK+, and JAFFE datasets, accuracy rates of 73.25%, 99.18%, and 97.10% were attained, respectively.

Keywords

facial expression recognition spatial transformer network channel attention residual network

Get full access to this article

View all access options for this article.

References

Adolphs

Damasio

Tranel

Damasio

A. R.

(1996). Cortical systems for the recognition of emotion in facial expressions. Journal of Neuroscience, 16(23), 7678–7687. https://doi.org/10.1523/JNEUROSCI.16-23-07678.1996

Akamatsu

M. J. S. L.

Kamachi

Gyoba

Budynek

(1998). The Japanese female facial expression (JAFFE) database. In: Proceedings of the third international conference on automatic face and gesture recognition, Nara, Japan, 14–16 April, pp. 14–16.

Aouayeb

Hamidouche

Soladie

Kpalma

Seguier

(2021). Learning vision transformer with squeeze and excitation for facial expression recognition. arXiv preprint arXiv:2107.03107.

Bediou

Asri

Brunelin

Krolak-Salmon

D'Amato

Saoud

Tazi

(2007). Emotion recognition and genetic vulnerability to schizophrenia. British Journal of Psychiatry, 191(2), 126–130. https://doi.org/10.1192/bjp.bp.106.028829

Berking

Wupperman

(2012). Emotion regulation and mental health: Recent findings, current challenges, and future directions. Current Opinion in Psychiatry, 25(2), 128–134. https://doi.org/10.1097/YCO.0b013e3283503669

Giannakakis

Pediaditis

Manousos

Kazantzaki

Chiarugi

Simos

P. G.

Marias

Tsiknakis

(2017). Stress and anxiety detection using facial cues from videos. Biomedical Signal Processing and Control, 31, 89–101. https://doi.org/10.1016/j.bspc.2016.06.020

Goodfellow

I. J.

Erhan

Carrier

P. L.

Courville

Mirza

Hamner

Cukierski

Tang

Thaler

Lee

D. H.

Zhou

(2013). Challenges in representation learning: A report on three machine learning contests. In: International conference on neural information processing. Springer, Berlin, Heidelberg.

Zhang

Ren

Sun

(2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition.

Hofmann

S. G.

Sawyer

A. T.

Fang

Asnaani

(2012). Emotion dysregulation model of mood and anxiety disorders. Depression and Anxiety, 29(5), 409–416. https://doi.org/10.1002/da.21888

10.

Shen

Sun

(2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition.

11.

Jaderberg

Simonyan

Zisserman

(2015). Spatial transformer networks. Advances in neural Information Processing Systems, 28.

12.

Kalantarian

Jedoui

Washington

Tariq

Dunlap

Schwartz

Wall

D. P.

(2019). Labeling images with facial emotion and the potential for pediatric healthcare. Artificial Intelligence in Medicine, 98, 77–86. https://doi.org/10.1016/j.artmed.2019.06.004

13.

Kim

J.-H.

Kim

B.-G.

Roy

P. P.

Jeong

(2019). Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access, 7, 41273–41285. https://doi.org/10.1109/ACCESS.2019.2907327

14.

Krizhevsky

Sutskever

Hinton

G. E.

(2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.

15.

Lavanga

De Ridder

Kotulska

Moavero

Curatolo

Weschke

Riney

Feucht

Krsek

Nabbout

Jansen

A. C.

Wojdan

Domanska-Pakieła

Kaczorowska-Frontczak

Hertzberg

Ferrier

C. H.

Samueli

Jahodova

Aronica

, … Caicedo

(2021). Results of quantitative EEG analysis are associated with autism spectrum disorder and development abnormalities in infants with tuberous sclerosis complex. Biomedical Signal Processing and Control, 68, 102658. https://doi.org/10.1016/j.bspc.2021.102658

16.

Luo

Zhang

Huang

(2021). A novel multi-feature joint learning ensemble framework for multi-label facial expression recognition. IEEE Access, 9, 119766–119777. https://doi.org/10.1109/ACCESS.2021.3108838

17.

Wang

Liu

(2022). An attention-based CoT-ResNet with channel shuffle mechanism for classification of Alzheimer’s disease levels. Frontiers in Aging Neuroscience, 14, 930584. https://doi.org/10.3389/fnagi.2022.930584

18.

Liu

Cai

Lin

Zhang

Xiao

(2023). FEDA: Fine-grained emotion difference analysis for facial expression recognition. Biomedical Signal Processing and Control, 79, 104209. https://doi.org/10.1016/j.bspc.2022.104209

19.

Liu

Peng

(2020). Driver fatigue detection based on deeply-learned facial expression representation. Journal of Visual Communication and Image Representation, 71, 102723. https://doi.org/10.1016/j.jvcir.2019.102723

20.

Lucey

Cohn

J. F.

Kanade

Saragih

Ambadar

Matthews

(2010). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 Ieee computer society conference on computer vision and pattern recognition-workshops. IEEE.

21.

Chen

Yong

(2019). Au r-cnn: Encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing, 355, 35–47. https://doi.org/10.1016/j.neucom.2019.03.082

22.

Minaee

Minaei

Abdolrashidi

(2021). Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors, 21(9), 3046. https://doi.org/10.3390/s21093046

23.

Misra

(2019). Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681 4.2: 10-48550.

24.

Park

Kim

Jang

Lee

Kim

I. J.

Choi

(2021). Robot facial expression framework for enhancing empathy in human-robot interaction. In: 2021 30th IEEE international conference on robot & human interactive communication (RO-MAN). IEEE.

25.

Podder

Bhattacharya

Majumdar

(2022). Time efficient real time facial expression recognition with CNN and transfer learning. Sādhanā, 47(3), 177. https://doi.org/10.1007/s12046-022-01943-x

26.

Sajjad

Nasir

Ullah

F. U.

Muhammad

Sangaiah

A. K.

Baik

S. W.

(2019). Raspberry Pi assisted facial expression recognition framework for smart security in law-enforcement services. Information Sciences, 479, 416–431. https://doi.org/10.1016/j.ins.2018.07.027

27.

Shao

Cheng

(2021). E-FCNN for tiny facial expression recognition. Applied Intelligence, 51, 549–559. https://doi.org/10.1007/s10489-020-01855-5

28.

Shi

Tan

(2021). Expression recognition method based on attention neural network. In: 2021 33rd chinese control and decision conference (CCDC). IEEE.

29.

Simonyan

Zisserman

(2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

30.

Srinivas

Lin

T. Y.

Parmar

Shlens

Abbeel

Vaswani

(2021). Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

31.

Tzirakis

Trigeorgis

Nicolaou

M. A.

Schuller

B. W.

Zafeiriou

(2017). End-to-end multimodal emotion recognition using deep neural networks. IEEE Journal of Selected Topics in Signal Processing, 11(8), 1301–1309. https://doi.org/10.1109/JSTSP.2017.2764438

32.

Woo

Park

Lee

J. Y.

Kweon

I. S.

(2018). Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV).

33.

Zhai

Chen

Kong

Zhou

(2022). Improved residual network for automatic classification grading of lettuce freshness. IEEE Access, 10, 44315–44325. https://doi.org/10.1109/ACCESS.2022.3169159

34.

Yang

Kuang

Yang

Xiao

Tang

(2022). RASN: Using attention and sharing affinity features to address sample imbalance in facial expression recognition. IEEE Access, 10, 103264–103274. https://doi.org/10.1109/ACCESS.2022.3210109

35.

Zhang

(2021). Joint expression synthesis and representation learning for facial expression recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(3), 1681–1695. https://doi.org/10.1109/TCSVT.2021.3056098

36.

Zhang

Zou

Huang

Wang

Zhao

Wang

(2022). DNN-CBAM: An enhanced DNN model for facial emotion recognition. Journal of Intelligent & Fuzzy Systems Preprint, 43(5), 5673–5683. https://doi.org/10.3233/JIFS-212846

Attentional Residual Network Based Spatial Transformer Mechanism for Facial Expression Recognition

Abstract

Keywords

Get full access to this article

References