Abstract
The rapid development of big data and artificial intelligence has made text topic classification an important part of natural language processing research, and it has also promoted the optimization of pre-trained model performance. In order to better promote the application of pre-trained models and improve the effect of text topic classification, this paper introduces the BERT (Bidirectional Encoder Representations from Transformer) model to conduct an in-depth exploration of English text topic classification. The text preprocesses the English text dataset through operations such as denoising, converting to lowercase, and removing stops, and then uses synonymous substitution to enhance the English text data. Subsequently, the BERT model was pre-trained, and the model was optimized and a BERT-based model structure was designed, followed by the construction of a topic classifier. Finally, this article also evaluated the practical effectiveness of the BERT-based model in English text topic classification. The research results show that when the classification number is 5, the BERT-based model can achieve the highest accuracy of 96.49%; when the number of tests is 50, the recall rate and F1 value of the BERT-based model are 96.10% and 91.66%, respectively, when the classification number is 5. The research results indicate that applying the BERT-based model to English text topic classification is completely feasible. It can improve its accuracy and recall, reduce classification time, and improve classification performance. Applying it to text classification can better improve the efficiency of text classification.
Keywords
Get full access to this article
View all access options for this article.
