Building an enhanced sentiment classification framework based on natural language processing

Abstract

Sentiment classification is one of the major tasks of natural language processing (NLP) and has gained much attention by researchers and businesses in recent years. However, the semantics of the social networking language is becoming increasingly complex and unpredictable, affecting the accuracy of the associated NLP systems. In this paper, we propose a hybrid sentiment analysis (SA) framework that classifies the opinions of Vietnamese reviews into one of two types: positive or negative. The special feature of the proposed framework is that it is built on a combination of three different text representation models that focus on analyzing social media network language characteristics. Our system achieved an accuracy score of 81.54% on the test set, which is better than other strategies. Based on the experimental results, this work proves that the choice of text representation model determines the performance of the system.

Keywords

Sentiment analysis sentiment classification natural language processing bag-of-words word2vec text representation

Get full access to this article

View all access options for this article.

References

Turney

P.D.

Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02 (2001), 417, doi: 10.3115/1073083.1073153.

Taboada

, Brooke

, Tofiloski

, Voll

and Stede

, Lexicon-based methods for sentiment analysis, Comput.Linguist 37(2) (2011), 267–307. doi: 10.1162/COLI_a_00049.

Yichun Yin

M.Z.

and Yangqiu Song Document-levelmultiaspect sentiment classification as machine comprehension, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), 2044–2054.

Zhou

, Wan

and Xiao

Attention-based LSTM Network for Cross-Lingual Sentiment Classification, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016), 247–256, doi: 10.18653/v1/D16-1024.

Hochreiter

and Schmidhuber

, Long short-term memory, Neural Comput. 9(8) (1997), 1735–1780. doi: 10.1162/neco.1997.9.8.1735.

Wang

, Xu

, Zhang

, Sun

, Wang

and Huang

, Syntax-Directed Hybrid Attention Network for Aspect-Level Sentiment Analysis, IEEE Access 7 (2019), 5014–5025. doi: 10.1109/ACCESS.2018.2885032.

Mikolov

, Chen

, Corrado

and Dean

Efficient Estimation of Word Representations in Vector Space, in International Conference on Learning Representations, 2013.

Devlin

L.K.J

and Chang

M.W.

BERT: pre-training of deep bidirectional transformers for language understanding, in Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019), 4171–4186.

Nguyen

D.Q.

and Nguyen

A.T.

PhoBERT: Pre-trained language models for Vietnamese, 2020.

10.

Dumais

, Furnas

, Landauer

and Deerwester

Latent semantic indexing, in Proceedings of the Text Retrieval Conference 1995, 1995.

11.

Hofmann

, Unsupervised learning by probabilistic Latent Semantic Analysis, Mach. Learn. 42(1–2) (2001), 177–196, doi: 10.1023/A:1007617005950.

12.

Blei

D.M.

, Ng

A.Y.

and Edu

J.B.

, Latent Dirichlet Allocation Michael I. Jordan , J. Mach. Learn. Res. 3 (2003), 993–1022, doi: 10.5555/944919.944937.

13.

Mnih

and Hint

A scalable hierarchical distributed language model | Proceedings of the 21st International Conference on Neural Information Processing Systems, in NIPS’08: Proceedings of the 21st International Conference on Neural Information Processing Systems (2008), 1081–1088.

14.

Mikolov

, Sutskever

, Chen

, Corrado

and Dean

, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. (2013), 3111–3119.

15.

Pennington

, Socher

and Manning

C.D.

, GloVe: Global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), 1532–1543, doi: 10.3115/v1/d14-1162.

16.

Liu

, Sentiment analysis and opinion mining, Synth. Lect. Hum. Lang. Technol. 5(1) (2012), 1–167, doi: 10.2200/S00416ED1V01Y201204HLT016.

17.

Gupta

, Singh

V.K.

, Mukhija

and Ghose

, Aspect-based sentiment analysis of mobile reviews, J. Intell. Fuzzy Syst. 36(5) (2019), 4721–4730, doi: 10.3233/JIFS-179021.

18.

Sana

, Ines

, Salma

and Ben Ayed

, A hybrid method for Arabic aspect-based sentiment analysis, Int. J. Hybrid Intell. Syst. 16(2) (2020), 99–110, doi: 10.3233/his-200285.

19.

Zhang

and Gao

, Multi-head attention model for aspect level sentiment analysis, J. Intell. Fuzzy Syst. 38(1) (2020), 89–96, doi: 10.3233/JIFS-179383.

20.

Zeng

, Dai

, Li

, Wang

and Sangaiah

A.K.

, Aspect based sentiment analysis by a linguistically regularized CNN with gated mechanism, J. Intell. Fuzzy Syst. 36(5) (2019), 3971–3980, doi: 10.3233/JIFS-169958.

21.

Tran

T.K.

and Phan

T.T.

, Deep Learning Application to Ensemble Learning—The Simple, but Effective, Approach to Sentiment Classifying, Appl. Sci 9(13) (2019), 2760, doi: 10.3390/app9132760.

22.

Tran

T.K.

and Phan

T.T.

, Capturing Contextual Factors in Sentiment Classification: An Ensemble Approach, IEEE Access 8 (2020), 116856–116865, doi: 10.1109/ACCESS.2020.3004180.

23.

González

J.Á.

, Hurtado

L.F.

and Pla

, Self-attention for Twitter sentiment analysis in Spanish, J.Intell. Fuzzy Syst. 39(2) (2020), 2165–2175, doi: 10.3233/JIFS-179881.

24.

Wei

, Tang

, Lei

and Wen

, The deep learning word vector model using part of speech and sentiment information, J. Intell. Fuzzy Syst. 38(1) (2020), 427–440, doi: 10.3233/JIFS-179417.

25.

Duranti

and Goodwin

, Rethinking context language interactive phenomenon 11. Cambridge University Press, 1992.

26.

Khang

N.V.

, Ngôn Ngũ Ma. ng - Biên Th Nggôn Ngũ Trên Ma. ng Tiêng Viê. t (Social Networking Language). Vinabook JSC, 2019.

27.

Hiemstra

, A probabilistic justification for using tf.idf term weighting in information retrieval, Int. J. Digit. Libr. 3(2) (2000), 131–139.

28.

Frank

, StefanKoppen

, Mathieu Noordman and Leo Vonk

G. M.

, Modeling multiple levels of text representation. Mahwah, NJ : Erlbaum, 2007.

29.

X.S.

, Vu

, Tran

N.S.

and Jiang

, ETNLP: A visual aided systematic approach to select pre-trained embeddings for a downstream task, in International Conference Recent Advances in Natural Language Processing, RANLP 2019-Septe (2019), 1285–1294, doi: 10.26615/978-954-452-056-4_47.

30.

Boser

B.E.

, Guyon

I.M.

and Vapnik

V.N.

, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (1992) 144–152, doi: 10.1145/130385.130401.

31.

Engel

, Polytomous logistic regression, Stat. Neerl. 42(4) (1988), 233–252, doi: 10.1111/j.1467-9574.1988.tb01238.x.

32.

Pal

S.K.

and Mitra

, Multilayer perceptron, fuzzy sets, and classification, IEEE Trans. Neural Networks 3(5) (1992), 683–697, doi: 10.1109/72.159058.

33.

Yoav

and Robert

E.S.

, Experiments with a new boosting algorithm | Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, in ICML’96: Proceedings of the Thirteenth International Conference on International Conference onMachine Learning (1996), 148–156.

34.

and Mikolov

, Distributed representations of sentences and documents, in ICML’14: Proceedings of the 31st International Conference on International Conference on Machine Learning (2014), 1188–1196.