Semantic relationship extraction of English long sentences and quality optimization of machine translation based on BERT model

Abstract

With the acceleration of globalization, cross-linguistic communication has become an indispensable part of daily life, and the status of English as an international lingua franca has become increasingly prominent. Faced with the complex semantic relations contained in long English sentences, the existing machine translation systems often show understanding deviations and translation distortions, which seriously affect the accuracy and coherence of information transmission. To solve this pain point, this study focuses on the latest achievement in the field of deep learning models and explores its application potential in English long sentence semantic relationship extraction and machine translation quality optimization. Firstly, the BERT model is fine-tuned to specialize in long sentence structure analysis and semantic relationship extraction. Experiments show that the F1 score of the model reaches 89.6% on the standard evaluation dataset CoNLL 2004, which is significantly higher than the previous best record. Based on this deeply mined semantic information, we further optimize the neural network machine translation system to effectively solve the long-distance dependency problem and significantly reduce the ambiguity and omission phenomenon in the translation process by introducing a novel attention guidance mechanism. In the blind test of the WMT ‘14 English-German translation task, the BLEU score of the translated version using the optimized NMT system is 29.5, which is 3.2 points higher than that of the benchmark model, which proves the remarkable effect of this method in improving translation quality.

Keywords

BERT English long sentence semantics machine translation

Get full access to this article

View all access options for this article.

References

Tabassum

Alyas

Hamid

, et al. Semantic analysis of Urdu English tweets empowered by machine learning. Intell Autom Soft Comput 2021; 30(1): 175–186.

Chai

. Statistical Language model-based analysis of English corpora and literature. Int J Adv Comput Sci Appl 2023; 14(9): 905–913.

Ansar

Goswami

Chakrabarti

, et al. An efficient methodology for aspect-based sentiment analysis using BERT through refined aspect extraction. J Intell Fuzzy Syst 2021; 40(5): 9627–9644.

Wan

Yeo

, et al. Emotion-cognitive reasoning integrated BERT for sentiment analysis of online public opinions on emergencies. Inf Process Manag 2024; 61(2): 103609.

Wang

Zheng

, et al. An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution. Nat Lang Eng 2023; 29(3): 669–692.

Ahmed

Huang

Arafat

, et al. Enriching Urdu NER with BERT embedding, data augmentation, and hybrid encoder-CNN architecture. ACM Trans Asian Low-Resour Lang Inf Process 2024; 23(4): 1–38.

Kim

K-H

Jeong

C-S

. F-ALBERT: a distilled model from a two-time distillation system for reduced computational complexity in ALBERT model. Appl Sci 2023; 13(17): 9530.

Kaliyar

Goswami

Narang

. FakeBERT: fake news detection in social media with a BERT-based deep learning approach. Multimed Tool Appl 2021; 80(8): 11765–11788.

Al-Ghamdi

Al-Khalifa

Al-Salman

. Fine-tuning BERT-based pre-trained models for Arabic dependency parsing. Appl Sci 2023; 13(7): 4225.

10.

Anwar

Afzal

Altaf

, et al. Fuzzy ensemble of fined tuned BERT models for domain-specific sentiment analysis of software engineering dataset. PLoS One 2024; 19(5): e0300279.

11.

Liu

Zhao

, et al. HBert: a long text processing method based on BERT and hierarchical attention mechanisms. Int J Semantic Web Inf Syst 2023; 19(1): 1–14.

12.

Nadeem

Mohsan

SAH

Ahmed

, et al. HyproBert: a fake news detection model based on deep hypercontext. Symmetry-Basel 2023; 15(2): 296.

13.

Wei

Wang

Yang

, et al. Imbalanced sentiment classification of online reviews based on SimBERT. J Intell Fuzzy Syst 2023; 45(5): 8015–8025.

14.

Qiao

Zou

Huang

, et al. A joint model for entity and relation extraction based on BERT. Neural Comput Appl 2022; 34(5): 3471–3481.

15.

Ryu

Lee

. Knowledge distillation for BERT unsupervised domain adaptation. Knowl Inf Syst 2022; 64(11): 3113–3128.

16.

Rahali

Akhloufi

. MalBERTv2: code aware BERT-based model for malware identification. Big Data Cogn Comput 2023; 7(2): 60.

17.

Beauchemin

Saggion

Khoury

. MeaningBERT: assessing meaning preservation between sentences. Front Artif Intell 2023; 6: 1223924.

18.

Saha

Arnob

NMK

Rahman

, et al. Mukh-oboyob: stable diffusion and BanglaBERT enhanced bangla text-to-face synthesis. Int J Adv Comput Sci Appl 2023; 14(11): 1392–1400.

19.

Khan

Amjad

Ashraf

, et al. Multi-class sentiment analysis of Urdu text using multilingual BERT. Sci Rep 2022; 12(1): 5436.

20.

Zhang

Song

Feng

, et al. Multi-self-attention for aspect category detection and biomedical multilabel text classification with BERT. Math Probl Eng 2021; 2021: 1–6.

21.

Nguyen

PN-T

Nguyen

, et al. Multi-stage transfer learning with BERTology-based language models for question answering system in Vietnamese. Int J Mach Learn Cybern 2023; 14(5): 1877–1902.

22.

Wang

Zhang

. Multi-task learning model based on BERT and knowledge graph for aspect-based sentiment analysis. Electronics 2023; 12(3): 737.

23.

Guven

Unalir

. Natural language based analysis of SQuAD: an analytical approach for BERT. Expert Syst Appl 2022; 195: 116592.

24.

Basith

Shankar

. Hybrid state analysis with improved firefly optimized linear congestion models of WSNs for DDOS & CRA attacks. PeerJ Comput Sci 2022; 8: e845.

25.

Sun

Guo

Zhang

, et al. A hybrid strategy guided multi-objective artificial physical optimizer algorithm. Inf Technol Control 2024; 53(1): 128–145.

26.

Davahli

Shamsi

Abaei

. Hybridizing genetic algorithm and grey wolf optimizer to advance an intelligent and lightweight intrusion detection system for IoT wireless networks. J Ambient Intell Hum Comput 2020; 11(11): 5581–5609.

27.

Melman

Evsutin

. Image watermarking based on a ratio of DCT coefficient sums using a gradient-based optimizer. Comput Electr Eng 2024; 117: 109271.

28.

Dhawale

Kamboj

Anand

. An improved Chaotic Harris Hawks Optimizer for solving numerical and engineering optimization problems. Eng Comput 2023; 39(2): 1183–1228.

29.

Bhuvaneshwari

Venkatachalam

Hubalovsky

, et al. Improved dragonfly optimizer for intrusion detection using deep clustering CNN-PSO classifier. Cmc-Computers Materials & Continua 2022; 70(3): 5949–5965.

30.

Guan

Yuan

, et al. Improved network intrusion classification with attention-assisted bidirectional LSTM and optimized sparse contractive autoencoders. Expert Syst Appl 2024; 244: 122966.