Abstract
Facial expressions play a vital role in non-verbal communication, conveying a wide range of emotions and messages. Although prior research achieved notable advances through architecture design or dataset-specific optimization, few studies have integrated multiple advanced techniques into a unified facial expression recognition (FER) pipeline. Addressing this gap, we propose a comprehensive approach that combines (i) multiple pre-trained CNNs, (ii) MTCNN-based face detection for improved facial region localization, and (iii) Grad-CAM-based interpretability. While MTCNN enhances the quality of face localization, it may slightly affect classification accuracy by focusing on cleaner yet more challenging samples. We evaluate four pre-trained models — DenseNet121, ResNet-50, ResNet18, and MobileNetV2 — on two datasets: Raf-DB and Cleaned-FER2013. The proposed pipeline demonstrates consistent improvements in interpretability and overall system robustness. The results emphasize the strength of integrating face detection, transfer learning, and interpretability techniques within a single framework can significantly enhance the transparency and reliability of FER systems. Combining FER with EEG-based systems significantly enhances the emotional intelligence of brain-computer interfaces, enabling more adaptive and personalized user experiences. With this approach the paper bridges the gap between affective computing and cognitive neuroscience, aligning closely EEG-centered interaction methodologies. Besides understanding the relationship between facial expressions of emotions and EEG signals will be an important study for literature.
Get full access to this article
View all access options for this article.
