Abstract
Multimodal discourse analysis enhances precision and contextual understanding in speech acts by integrating modalities such as visual cues, text, non-verbal signals, and gaze tracking. This study explores the effectiveness of multimodal discourse in improving speech act classification through combined visual, auditory, and non-verbal data. A mixed-method approach was employed, involving quantitative data from 370 communication professionals analyzed using Statistical Package For Social Sciences (SPSS), alongside qualitative insights from interviews and focus groups. Findings indicate that visual cues significantly enhance speech act classification performance, while audio intonation improves accuracy under noisy conditions. The integration of text and non-verbal data further supports deeper contextual understanding, particularly benefiting indirect speech act recognition and overall multimodal fusion effectiveness. This study's holistic approach uniquely combines multiple modalities, visual, audio, text, and gaze tracking, surpassing previous research focused on isolated speech interpretation factors. Multimodality significantly improves accuracy and contextual comprehension in speech act classification, demonstrating that communication analysis should extend beyond textual content to include audio and non-verbal traits for a fuller understanding.
Keywords
Get full access to this article
View all access options for this article.
