Sage Journals: Discover world-class research

Abstract

Multimodal discourse analysis enhances precision and contextual understanding in speech acts by integrating modalities such as visual cues, text, non-verbal signals, and gaze tracking. This study explores the effectiveness of multimodal discourse in improving speech act classification through combined visual, auditory, and non-verbal data. A mixed-method approach was employed, involving quantitative data from 370 communication professionals analyzed using Statistical Package For Social Sciences (SPSS), alongside qualitative insights from interviews and focus groups. Findings indicate that visual cues significantly enhance speech act classification performance, while audio intonation improves accuracy under noisy conditions. The integration of text and non-verbal data further supports deeper contextual understanding, particularly benefiting indirect speech act recognition and overall multimodal fusion effectiveness. This study's holistic approach uniquely combines multiple modalities, visual, audio, text, and gaze tracking, surpassing previous research focused on isolated speech interpretation factors. Multimodality significantly improves accuracy and contextual comprehension in speech act classification, demonstrating that communication analysis should extend beyond textual content to include audio and non-verbal traits for a fuller understanding.

Keywords

audio annotation multimodal analysis gaze tracking non-verbal communication visual cues speech act

Get full access to this article

View all access options for this article.

References

Abdel-Raheem

(2023) Do political cartoons and illustrations have their own specialized forms for warnings, threats, and the like? Speech acts in the nonverbal mode. Social Semiotics 33(1): 64–97.

Abdelrahim

(2024) Semiotic modes accentuating learners’ metafunctions: The systemic functional approach to multimodal discourse analysis. Southern African Linguistics and Applied Language Studies 42(2): 203–219.

Adami

Swanwick

(2024) Signs of understanding and turns-as-actions: A multimodal analysis of deaf–hearing interaction. Visual Communication 23(1): 4–28.

Al-Tameemi

IKS

Feizi-Derakhshi

Pashazadeh

, et al. (2023) Interpretable multimodal sentiment classification using deep multi-view attentive network of image and text data. IEEE Access 11: 91060–91081.

Alviar

Dale

Dewitt

, et al. (2020) Multimodal coordination of sound and movement in music and speech. Discourse Processes 57(8): 682–702.

Alyousef

(2020) A multimodal discourse analysis of English dentistry texts written by Saudi undergraduate students: A study of theme and information structure. Open Linguistics 6(1): 267–283.

Baber

Mustafa

Leggett

, et al. (2024) Automating the analysis of speech acts in teams to understand distributed sensemaking. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 68(1): 130–136. Sage CA: Los Angeles, CA: SAGE Publications.

Bateman

Tseng

(2023) Multimodal discourse analysis as a method for revealing narrative strategies in news videos. Multimodal Communication 12(3): 261–285.

Bernad-Mechó

(2023) Multimodal (inter) action analysis for the study of lectures: Active and passive uses of metadiscourse. Multimodal Communication 12(1): 7–21.

10.

Boux

Margiotoudi

Dreyer

, et al. (2023) Cognitive features of indirect speech acts. Language, Cognition and Neuroscience 38(1): 40–64.

11.

Brône

Oben

Jehoul

, et al. (2017) Eye gaze and viewpoint in multimodal interaction management. Cognitive Linguistics 28(3): 449–483.

12.

Budiastono

Noverino

(2024) A multimodal critical discourse analysis of “the marvels” movie poster. International Journal of English and Applied Linguistics (IJEAL) 4(1): 106–115.

13.

Chattopadhyay

(2024) Speech act theory in visual narratives: An analysis of communication through sequential art. Gipan 6(1): 119–133.

14.

de Bernardi

(2022) Sámi tourism in marketing material: A multimodal discourse analysis. Acta Borealia 39(2): 115–137.

15.

Fenghour

Chen

Guo

, et al. (2020) Lip reading sentences using deep learning with only visual cues. IEEE Access 8: 215516–215530.

16.

Incelli

(2022) Engaging students in multimodal literacy practices in a university ESP context: Towards understanding identity and ideology in government debates. Multimodal Communication 11(1): 49–61.

17.

Kalateh

Estrada-Jimenez

Hojjati

, et al. (2024) A systematic review on multimodal emotion recognition: building blocks, current state, applications, and challenges. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3430850 .

18.

Kumar

Vepa

(2020) Gated mechanism for attention based multi modal sentiment analysis. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp.4477–4481. https://doi.org/10.1109/ICASSP40776.2020.9053012 .

19.

Lehmann-Willenbrock

Hung

(2024) A multimodal social signal processing approach to team interactions. Organizational Research Methods 27(3): 477–515.

20.

Lehmann

(2024) Multimodal constructions revisited. Testing the strength of association between spoken and non-spoken features of tell me about it. Cognitive Linguistics 35(3): 407–437.

21.

Levinson

(1980) Speech act theory: The state of the Art1. Language Teaching 13(1–2): 5–24.

22.

López

(2020) Discursive de/humanizing: A multimodal critical discourse analysis of television news representations of undocumented youth. Education Policy Analysis Archives 28: 47–47.

23.

Mayrita

Fitrah

Mukminin

, et al. (2024) Verbal and nonverbal languages in online thesis examinations: An illocutionary act study. Qubahan Academic Journal 4(1): 167–176.

24.

Mello

Ferrari

Rocha

(2020) Multimodality, segmentation and prominence in speech. Journal of Speech Sciences 9: 1–6.

25.

Michelsanti

Tan

Zhang

, et al. (2021) An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Transactions on Audio, Speech and Language Processing 29: 1368–1396.

26.

Oncescu

A M

Henriques

Liu

, et al. (2021) Queryd: A video dataset with high-quality text and audio narrations. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp.2265–2269. https://doi.org/10.1109/ICASSP39728.2021.9414640 .

27.

Qadir

(2023) Visual rhetoric in election posters: A multimodal critical discourse analysis approach. Koya University Journal of Humanities and Social Sciences 6(1): 136–159.

28.

Rathipriya

Maheswari

(2024) A comprehensive review of recent advances in deep neural networks for lipreading with sign language recognition. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3463969 .

29.

Ruiz-Madrid

Valeiras-Jurado

(2020) Developing multimodal communicative competence in emerging academic and professional genres. International Journal of English Studies 20(1): 27–50.

30.

Smith

(1990) Towards a history of speech act theory. In: Roland

Georg

(eds) Speech Acts, Meanings and Intentions. Critical Approaches to the Philosophy of John R. Searle. Berlin, Germany: DE GRUYTER, 29–61. https://doi.org/10.1515/9783110859485.29 .

31.

Sun

Chu

Zhou

, et al. (2024) AVI-talking: Learning audio-visual instructions for expressive 3D talking face generation. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3390182 .

32.

Tian

Liu

Wang

(2024) A corpus study on the difference of turn-taking in online audio, online video, and face-to-face conversation. Language and Speech 67(3): 593–616.

33.

Triantafyllopoulos

Schuller

İymen

, et al. (2023) An overview of affective speech synthesis and conversion in the deep learning era. Proceedings of the IEEE 111(10): 1355–1381.

34.

Udahemuka

Djouani

Kurien

(2024) Multimodal emotion recognition using visual, vocal and physiological signals: A review. Applied Sciences 14(17): 8071.

35.

Wigham

Satar

(2024) Adapting and extending multimodal (inter) action analysis to investigate synchronous multimodal online language teaching. Multimodal Communication 13(3): 415–426.

36.