Sage Journals: Discover world-class research

Abstract

The rapid development of big data and artificial intelligence has made text topic classification an important part of natural language processing research, and it has also promoted the optimization of pre-trained model performance. In order to better promote the application of pre-trained models and improve the effect of text topic classification, this paper introduces the BERT (Bidirectional Encoder Representations from Transformer) model to conduct an in-depth exploration of English text topic classification. The text preprocesses the English text dataset through operations such as denoising, converting to lowercase, and removing stops, and then uses synonymous substitution to enhance the English text data. Subsequently, the BERT model was pre-trained, and the model was optimized and a BERT-based model structure was designed, followed by the construction of a topic classifier. Finally, this article also evaluated the practical effectiveness of the BERT-based model in English text topic classification. The research results show that when the classification number is 5, the BERT-based model can achieve the highest accuracy of 96.49%; when the number of tests is 50, the recall rate and F1 value of the BERT-based model are 96.10% and 91.66%, respectively, when the classification number is 5. The research results indicate that applying the BERT-based model to English text topic classification is completely feasible. It can improve its accuracy and recall, reduce classification time, and improve classification performance. Applying it to text classification can better improve the efficiency of text classification.

Keywords

text topic classification BERT-based model English text data augmentation classification effectiveness

Get full access to this article

View all access options for this article.

References

Peng

, et al. A survey on text classification: from traditional to deep learning. ACM Trans Intell Syst Technol 2022; 13(2): 1–41. DOI: 10.1145/3495162.

Qiang

Qian

, et al. Short text topic modeling techniques, applications, and performance: a survey. IEEE Trans Knowl Data Eng 2020; 34(3): 1427–1445. DOI: 10.1109/TKDE.2020.2992485.

Alfarizi

Syafaah

Lestandy

. Emotional text classification using TF-IDF (term frequency-inverse document frequency) and LSTM (long short-term memory). JUITA: Jurnal Informatika 2022; 10(2): 225–232. DOI: 10.30595/juita.v10i2.13262.

Luthfi

Lhaksamana

. Implementation of TF-IDF method and support vector machine algorithm for job applicants text classification. Jurnal Media Informatika Budidarma 2020; 4(4): 1181–1186. DOI: 10.30865/mib.v4i4.2276.

Bilal

Ali Almazroi

. Effectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviews. Electron Commer Res 2023; 23(4): 2737–2757. DOI: 10.1007/s10660-022-09560-w.

Huan

Zhang

Wang

. A review on main optimization methods of bert. Data Analysis and Knowledge Discovery 2021; 5(1): 3–15. DOI: 10.11925/infotech.2096-3467.2020.0965.

Minaee

Kalchbrenner

Cambria

, et al. Deep learning--based text classification: a comprehensive review. ACM Comput Surv 2021; 54(3): 1–40. DOI: 10.1145/3439726.

Osnabrugge

Ash

Morelli

. Cross-domain topic classification for political texts. Polit Anal 2023; 31(1): 59–80. DOI: 10.1017/pan.2021.37.

Kadhim

. Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 2019; 52(1): 273–292. DOI: 10.1007/s10462-018-09677-1.

10.

Hartmann

Huppertz

Schamp

, et al. Comparing automated text classification methods. Int J Res Market 2019; 36(1): 20–38. DOI: 10.1016/j.ijresmar.2018.09.009.

11.

. An integrated topic modelling and graph neural network for improving cross-lingual text classification. ACM Trans Asian Low-Resour Lang Inf Process 2022; 22(1): 1–18. DOI: 10.1145/3530260.

12.

Zhang

Chen

Guo

, et al. A hybrid neural network text topic classification method that combines abstract and subject characteristics. Computer and Digital Engineering 2020; 48(5): 1100–1107. DOI: 10.3969/j.issn.1672-9722.2020.05.021.

13.

Alammary

. BERT models for Arabic text classification: a systematic review. Appl Sci 2022; 12(11): 5720. DOI: 10.3390/app12115720.

14.

Garrido-Merchan

Gozalo-Brizuela

Gonzalez-Carvajal

. Comparing BERT against traditional machine learning models in text classification. J Comput Cogn Eng 2023; 2(4): 352–356. DOI: 10.47852/bonviewJCCE3202838.

15.

Yan

Sun

Zhang

, et al. A service text classification method based on the domain BERT model. Journal of Air Force Engineering University 2023; 24(1): 103–111. DOI: 10.3969/j.issn.2097-1915.2023.01.015.

16.

Yuan

Liu

, et al. Text classification of defects in power grid equipment based on the BERT pre-trained language model. J Nanjing Univ Sci Technol (Nat Sci) 2020; 44(4): 446–453.

17.

Hickman

Thapa

Tay

, et al. Text preprocessing for text mining in organizational research: review and recommendations. Organ Res Methods 2022; 25(1): 114–146. DOI: 10.1177/1094428120971683.

18.

Chai

. Comparison of text preprocessing methods. Nat Lang Eng 2023; 29(3): 509–553. DOI: 10.1017/S1351324922000213.

19.

Shao

Xiang

, et al. Fault diagnosis method and application based on multi-scale neural network and data enhancement for strong noise. J Vib Eng Technol 2024; 12(1): 295–308. DOI: 10.1007/s42417-022-00844-x.

20.

Zong

Guan

Pan

, et al. Research on fault diagnosis of rolling bearings based on secondary data enhancement and deep convolution. J Mech Eng 2021; 57(23): 106–115.

21.

Yiming

Pei

Jin

. Research on the identification of intent intensity of medical information query based on task knowledge fusion and text data enhancement. Data analysis and Knowledge Discovery 2023; 7(2): 38–47. DOI: 10.11925/infotech.2096-3467.2022.0919.

22.

Zhou

Xuefeng

Cui

, et al. A data enhancement strategy for automatic summary tasks for long text and small data sets. Chinese Journal of Informatics 2022; 36(9): 46–56.

23.

Shishah

. Fake news detection using BERT model with joint learning. Arab J Sci Eng 2021; 46(9): 9115–9127. DOI: 10.1007/s13369-021-05780-8.

24.

Abdel-Salam

Ahmed

. Performance study on extractive text summarization using BERT models. Information 2022; 13(2): 67. DOI: 10.3390/info13020067.

25.

Qiao

Zhu

Gong

. BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models. Bioinformatics 2022; 38(3): 648–654. DOI: 10.1093/bioinformatics/btab712.

26.

Lim

Seo

Jung

. Fine-tuning BERT models for keyphrase extraction in scientific articles. jaitc 2020; 10(1): 45–56. DOI: 10.14801/jaitc.2020.10.1.45.

27.

Acheampong

Nunoo-Mensah

Chen

. Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev 2021; 54(8): 5789–5829. DOI: 10.1007/s10462-021-09958-2.

28.

Chriqui

Yahav

. HeBERT and HebEMO: a Hebrew BERT model and a tool for polarity analysis and emotion recognition. INFORMS Journal on Data Science 2022; 1(1): 81–95.

29.

Ding

Wen

Lin

. Domain entity recognition based on pre-trained BERT word embedding model. Intelligence Engineering 2019; 5(6): 65–74.

30.

Zhang

Tang

, et al. A deep learning recognition method for target entities in the military field based on pre-trained BERT. Journal of Information Engineering University 2021; 22(3): 331–337. DOI: 10.3969/j.issn.1671-0673.2021.03.013.

English text topic classification using BERT-based model

Abstract

Keywords

Get full access to this article

References