Abstract
Emotional responses to visual art involve perceptual and physiological interactions, but existing methods isolate these modalities, limiting individualized affective modeling. This study presents BioArt-Net, a computational framework integrating visual semantic analysis and physiological signal processing for viewer-centric art emotion recognition, focusing on cross-modal fusion methodologies. Key modules: Visual semantics via fine-tuned ViT (16 × 16 patches, 32% dimensionality reduction); EEG (wavelet transform), GSR/PPG (1D CNNs) encoding with 94% data integrity; token-level attention aligning modalities; compound loss (CE + MSE + entropy) reducing redundancy by 27%; optimized via distillation (48% fewer parameters, 89 ms latency). On 520-session BioArt-Emotion Dataset, accuracy reaches 0.87 (vs 0.76 unimodal), F1 = 0.85. Cross-modal attention boosts accuracy by 11%; physiological reweighting stabilizes results. This advances computational neuroaesthetics, fitting the journal’s focus on innovative interdisciplinary frameworks.
Keywords
Get full access to this article
View all access options for this article.
