Abstract
Abstract: The increasing volume of multilingual news broadcasts highlights the need for advanced systems capable of transforming speech into semantically comparable text across languages. Traditional speech-to-text and textual similarity methods often fall short in handling linguistic diversity, contextual ambiguity, and cross-lingual semantic alignment. To overcome these limitations, we introduce a Transformer–Graph Neural Network (GNN) integrated framework for multilingual news speech-to-text similarity modeling. This article presents an approach that leverages a Transformer encoder to extract deep contextual embeddings from speech inputs, capturing sequential and contextual nuances. These embeddings are then structured into graphs that represent semantic relations among words, phrases, and sentences. A GNN refines these graph-based representations by modeling relational dependencies across languages. Finally, a cross-lingual semantic alignment module produces similarity scores, enabling accurate transformation of multilingual speech into comparable text. Experiments conducted on benchmark multilingual news video datasets in English, Hindi, Marathi, and Tamil show that our framework consistently outperforms baseline models, including standalone Transformers and GNNs. The model achieved significant gains, with improvements of 7.8% in semantic similarity accuracy, 6.1% in BLEU score, and 8.4% in cross-lingual alignment efficiency. Furthermore, it demonstrated robustness to noisy input, code-switching, and low-resource language scenarios, making it suitable for practical multilingual news applications. The proposed approach achieved a relative improvement of 4.8% in semantic similarity and a 3.1% reduction in word error rate compared with the baseline models. Future directions include extending the framework for real-time deployment, expanding support to underrepresented languages, and incorporating multimodal news data for enriched global media analysis.
Keywords
Get full access to this article
View all access options for this article.
