GFB-YOLO: Dual-path edge enhancement and adaptive multi-scale fusion for facial expression recognition

Abstract

To address the challenges of inadequate multi-scale feature adaptation and loss of fine-grained details in facial expression recognition, we proposed a dual-path enhancement network, GFB-YOLO (Global Edge Information Transfer Path + Flexible Inception Mixer Bottleneck-YOLO), built on the YOLOv11 object detection framework in this research. First, we introduced a Flexible Inception Mixer Bottleneck (FIMB) into the backbone network to reconstruct the convolution structure. And, a deformable convolution kernel weighting mechanism was used, which enabled adaptive expansion of the model’s receptive field based on the input facial scale. Next, we designed a global edge information transfer path (GEITP) that operates in parallel with the detection backbone. A three-level feature pyramid was constructed using the Sobel Edge Generator (SEG), and the features are enhanced by the Detail Fusion Module (DFM) for cross-scale edge reinforcement. Experimental results demonstrate that the model achieves an expression detection accuracy of 84.6% on the original RAF-DB dataset, marking a 6.1% improvement over the original YOLOv11, which significantly outperforms other models for multiple expressions. Notably, the model showed substantial improvements in recognizing expressions such as fear and disgust. This improvement highlights the effectiveness of the proposed modules in extracting fine-grained features for accurately recognizing subtle and easily confused facial expressions. This work overcomes the limitation of traditional expression recognition models that focused more on classification than on localization and offers a new approach for real-time emotion computing based on the object detection paradigm.

Keywords

YOLO facial expression recognition deep learning driving behavior analysis smart cockpit

Get full access to this article

View all access options for this article.

References

Mellouk

Handouzi

Facial emotion recognition using deep learning: review and insights. Procedia Comput Sci 2020; 175: 689–694.

Gabriels

Response

“Uncertainty in emotion recognition.”J Inf Commun Ethics Soc 2019; 17(3): 295–298.

Zhang

Zhao

XM.

Robust facial expression recognition using improved sparse classifier. Int J Comput Appl Technol 2015; 52(1): 59–70.

Saurav

Saini

Singh

Fast facial expression recognition using Boosted Histogram of Oriented Gradient (BHOG) features. Pattern Anal Appl 2023; 26: 381–402.

Wang

AdaBoost for feature selection, classification and its relation with SVM: a review. Phys Procedia 2012; 25: 800–807.

, et al. Micro-expression spotting based on optical flow features. Pattern Recogn Lett 2022; 163: 57–64.

Kolesnikov

Dosovitskiy

Weissenborn

, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: International conference on learning representations (ICLR), Virtual conference (Vienna, Austria), May 3–7, 2021.

Sun

Spatio-temporal transformer for dynamic facial expression recognition in the wild. arXiv preprint arXiv:2205.04749, 2022.

Zhao

Liu

Former-DFER: dynamic facial expression recognition transformer. In : Proceedings of the 29th ACM international conference on multimedia (MM ’21), Virtual Event China, 20–24 October 2021, pp. 1553–1561. New York, NY: Association for Computing Machinery.

10.

Zhang

Facial expression recognition method based on PSA—YOLO network. Front Neurorobot 2023; 16: 1057983.

11.

Ling

Liang

Wang

, et al. A facial expression recognition system for smart learning based on YOLO and vision transformer. In: Proceedings of the 2021 7th international conference on computing and artificial intelligence (ICCAI ’21), Tianjin, China, 14–16 May 2021, pp. 178–182. New York, NY: Association for Computing Machinery.

12.

Lei

Celik

, et al. FER-YOLO-Mamba: facial expression detection and classification based on selective state space. arXiv preprint arXiv:2405.01828, 2024.

13.

Peng

Sun

Zou

, et al. Facial expression recognition–you only look once–neighborhood coordinate attention Mamba: facial expression detection and classification based on neighbor and coordinates attention mechanism. Sensors 2024; 24(21): 6912.

14.

Weng

Tian

, et al. YOLO-Ti: an efficient object detection approach for tiny facial markers. In: International conference on optics, electronics, and communication engineering (OECE 2024), Wuhan, China, 26–28 July 2024, pp. 578–588. Bellingham, WA: SPIE.

15.

Hosney

Talaat

El-Gendy

, et al. AutYOLO-ATT: an attention-based YOLOv8 algorithm for early autism diagnosis through facial expression recognition. Neural Comput Appl 2024; 36(27): 17199–17219.

16.

Hernández-Aguilar

Hernández

Rivera Cruz

, et al. A new approach for counting and identification of students sentiments in online virtual environments using convolutional neural networks. In: Calvo

Martínez-Villaseñor

Ponce

, et al. (eds) Mexican international conference on artificial intelligence. Springer Nature Switzerland, 2023, pp. 29–40.

17.

Rakheja

, et al. Distracted driving behavior and driver’s emotion detection based on improved YOLOv8 with attention mechanism. IEEE Access 2024; 12: 37983–37994.

18.

Zhong

Han

Xia

, et al. Research on real-time teachers’ facial expression recognition based on YOLOv5 and attention mechanisms. EURASIP J Adv Signal Process 2023; 2023(1): 55.

19.

Celik

FER-YOLO: detection and classification based on facial expressions. In: Peng

Gabbouj

, et al. (eds) 11th international conference on image and graphics, ICIG 2021. Springer-Verlag, 2021, pp. 28–39.

20.

Rasyid

Sutopo

Comparison of YOLO-v8 and YOLO-v10 in detecting human facial emotions. Indones J Appl Technol 2025; 2(1): 1–14.

21.

Khanam

Hussain

Yolov11: an overview of the key architectural enhancements. arXiv preprint arXiv: 2410.17725, 2024.

22.

Yang

Guan

Zhao

, et al. Multi-branch auxiliary fusion YOLO with re-parameterization heterogeneous convolutional for accurate object detection. In: Lin

Cheng

, et al. (eds) Chinese conference on pattern recognition and computer vision (PRCV). Springer NatureSingapore, 2024, pp. 492–505.

23.

Kanopoulos

Vasanthavada

Baker

RL.

Design of an image edge detection filter using the Sobel operator. IEEE J Solid State Circuits 1988; 23(2): 358–367.

24.

Szegedy

Liu

Jia

, et al. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp. 1–9. New York: IEEE.

25.

Szegedy

Vanhoucke

Ioffe

, et al. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 2818–2826. New York: IEEE.

26.

Zhou

Yan

, et al. InceptionNeXt: when inception meets ConvNeXt. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 16–22 June 2024, pp. 5672–5683. New York: IEEE.

27.

Shi

TransNeXt: robust foveal visual perception for vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 16–22 June 2024, pp. 17773–17783. New York: IEEE.

28.

Deng

Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017, pp. 2584–2593. New York: IEEE.

29.

White

Kalkan

, et al. Investigating bias and fairness in facial expression recognition. In: Bartoli

Fusiello

(eds) European conference on computer vision. Springer, 2020, pp. 506–523.

30.

Liu

Anguelov

Erhan

, et al. SSD: single shot multibox detector. In: Leibe

Matas

Sebe

, et al. (eds) Computer vision – ECCV 2016: 14th European conference, proceedings, part I. Springer, 2016, pp. 21–37.

31.

Lin

T-Y

Goyal

Girshick

, et al. Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), Venice, Italy, 22–29 October 2017, pp. 2980–2988. New York: IEEE.

32.

Bochkovskiy

Wang

C-Y

Liao

H-YM

. YOLOv3: an incremental improvement. arXiv preprint arXiv:1804. 02767, 2018.

33.

Zhou

Wang

Krähenbühl

Objects as points. arXiv preprint arXiv:1904.07850, 2019.

34.

Tan

EfficientNet: rethinking model scaling for convolutional neural networks. Proc Mach Learn Res 2019; 97: 6105–6114.

35.

Bochkovskiy

Wang

Liao

HYM

. YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

36.

Wang

Bochkovskiy

Liao

HYM

. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Vancouver, BC, Canada, 17–24 June 2023, pp. 7464–7475. New York: IEEE.

37.

Liu

Wang

, et al. YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430, 2021.

38.

Wang

Chen

Liu

, et al. YOLOv10: real-time end-to-end object detection. Adv Neural Inf Process Syst 2024; 37: 107984–108011.

39.

Selvaraju

Cogswell

Das

, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision (ICCV), Venice, Italy, 22–29 October 2017, pp. 618–626. New York: IEEE.