Sage Journals: Discover world-class research

Abstract

Abstract: The increasing volume of multilingual news broadcasts highlights the need for advanced systems capable of transforming speech into semantically comparable text across languages. Traditional speech-to-text and textual similarity methods often fall short in handling linguistic diversity, contextual ambiguity, and cross-lingual semantic alignment. To overcome these limitations, we introduce a Transformer–Graph Neural Network (GNN) integrated framework for multilingual news speech-to-text similarity modeling. This article presents an approach that leverages a Transformer encoder to extract deep contextual embeddings from speech inputs, capturing sequential and contextual nuances. These embeddings are then structured into graphs that represent semantic relations among words, phrases, and sentences. A GNN refines these graph-based representations by modeling relational dependencies across languages. Finally, a cross-lingual semantic alignment module produces similarity scores, enabling accurate transformation of multilingual speech into comparable text. Experiments conducted on benchmark multilingual news video datasets in English, Hindi, Marathi, and Tamil show that our framework consistently outperforms baseline models, including standalone Transformers and GNNs. The model achieved significant gains, with improvements of 7.8% in semantic similarity accuracy, 6.1% in BLEU score, and 8.4% in cross-lingual alignment efficiency. Furthermore, it demonstrated robustness to noisy input, code-switching, and low-resource language scenarios, making it suitable for practical multilingual news applications. The proposed approach achieved a relative improvement of 4.8% in semantic similarity and a 3.1% reduction in word error rate compared with the baseline models. Future directions include extending the framework for real-time deployment, expanding support to underrepresented languages, and incorporating multimodal news data for enriched global media analysis.

Keywords

Graph Neural Network multilingual data processing Natural Language Processing news similarity speech-to-text conversion transformer model

Get full access to this article

View all access options for this article.

References

1. Sethiya

, Maurya

. End-to-end speech-to-text translation: A survey. Computer Speech & Language 2025;90:101751.

2. Xu

, Ye

, Dong

, et al. Recent advances in direct speech-to-text translation. arXiv Preprint 2023.

3. Wu

, Gaur

, Chen

, et al. On decoder-only architecture for speech-to-text and large language model integration. In: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE; 2023; pp. 1–8.

4. Kurihara

, Seiyama

, Kumano

, et al. AI news anchor” with deep learning-based speech synthesis. SMPTE Mot Imag J 2021;130(3):19–27.

5. Van Gorp

, van der Deure

. Speech-To-Local data: Exploring ASR files of archived television news (2004–2018) on the 1986 chernobyl nuclear disaster. Historical Journal of Film, Radio and Television 2025;45(3):600–619.

6. Kim

, Choi

, Kim

, et al. Textless unit-to-unit training for many-to-many multilingual speech-to-speech translation. IEEE/ACM Trans Audio Speech Lang Process 2024;32:3934–3946.

7. Avila

, Crego

. Leveraging Large Pre-trained Multilingual Models for High-Quality Speech-to-Text Translation on Industry Scenarios. In: Proceedings of the 31st International Conference on Computational Linguistics. 2025; pp. 7624–7633.

8. Anwar

, Shi

, Goswami

, et al. Muavic: A multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation. arXiv Preprint 2023.

9. Rudrappa

, Reddy

, Hanumanthappa

. KHiTE: Multilingual Speech Acquisition to Monolingual Text Translation. Indian J Sci Technol 2023;16(21):1572–1579.

10.

10. Rothman

. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, Hugging Face, and OpenAI’s GPT-3, ChatGPT, and GPT-4. Packt Publishing Ltd; 2022.

11.

11. Patwardhan

, Marrone

, Sansone

. Transformers in the real world: A survey on nlp applications. Information 2023;14(4):242.

12.

12. Tunstall

, Von Werra

, Wolf

. Natural language processing with transformers. O’Reilly Media, Inc; 2022.

13.

13. Rahali

, Akhloufi

. End-to-end transformer-based models in textual-based NLP. Ai 2023;4(1):54–110.

14.

14. Li

, Sun

, Wang

. TFGT-Net: Time-Frequency Graph Transformer Network For Multi-channel Speech Enhancement. In: International Conference On Signal And Information Processing, Networking And Computers. Springer Nature Singapore: Singapore; 2024; pp. 926–935.

15.

15. Lu

, Yang

, Huang

, et al. Tgca: A transformer gnn-based approach with cross-attention mechanism for steganographic text detection in social networks. In: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2025; pp. 1–5.

Leveraging Transformer-GNN Integration for Multilingual News Speech-to-Text Similarity Modeling

Abstract

Keywords

Get full access to this article

References