A weakly supervised approach to Chinese sentiment classification using partitioned self-training

Abstract

With the rapid evolution of documents on the World Wide Web which express opinions, there exists an increasing demand for developing such a sentiment analysis technique that can easily adapt to new domains with minimum supervision. This article introduces a novel weakly supervised approach for Chinese sentiment classification. The approach applies a variant of self-training algorithm on two partitions split from test dataset, and combines classification results of the two partitions into a pseudo-labelled training set and an unlabelled test set, then trains an initial classifier on the pseudo-labelled training set and adopts a standard self-learning cycle to obtain the overall classification results. Experiments on the four datasets from two domains show that our approach has competitive advantages over baseline approaches; it even outperforms the supervised approach in some of the datasets despite using no labelled documents.

Keywords

opinion mining self-training sentiment classification weakly supervised

Get full access to this article

View all access options for this article.

References

Liu

. Sentiment analysis and subjectivity. In: Indurkhya

Damerau

(eds) Handbook of natural language processing, 2nd edn. New York: Chapman and Hall, 2010, p. 629.

Pang

Lee

Vaithyanathan

. Thumbs up? Sentiment classification using machine learning techniques. In: Conference on empirical methods in natural language processing (EMNLP), Philadelphia, PA, 2002, pp. 79–86.

Aue

Gamon

. Customizing sentiment classifiers to new domains: A case study. Proceedings of recent advances in natural language processing (RANLP), 2005, pp. 207–218.

Turney

. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: 40th annual meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, 2002, pp. 417–424.

Melville

Gryc

Lawrence

. Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, 2009, pp. 1275–1284.

Tan

Wang

Cheng

. Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: Proceedings of the SIGIR, 2008, pp. 743–744.

Zhang

Sindhwani

. A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the joint conference of the annual meeting of the association for computational linguistics and the international joint conference on natural language processing of the asian federation of natural language processing (ACL-IJCNLP), 2009, pp. 244–252.

Andreevskaia

Bergler

. When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. In: Proceedings of the association for computational linguistics and the human language technology conference (ACL-HLT), 2008, pp. 290–298.

Zagibalov

Carroll

. Unsupervised classification of sentiment and objectivity in Chinese text. In: Proceedings of the IJCNLP, 2008, pp. 304–311.

10.

Zagibalov

Carroll

. Automatic seed word selection for unsupervised sentiment classification of Chinese text. In: Proceedings of the 22nd international conference on computational linguistics (COLING), 2008, pp. 1073–1080.

11.

Qiu

Zhang

Zhao

. SELC: A self-supervised model for sentiment classification. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM),2009, pp. 929–936.

12.

Huang

Zhou

Lee

. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In: Proceedings of the 48th annual meeting of the association for computational linguistics, 2010, pp. 414–423.

13.

Abbasi

France

Zhang

Chen

. Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering 2011; 23(3): 447–462.

14.

Xia

Zong

. Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences 2011; 181: 1138–1152.

15.

Thet

J-C

Khoo

SGC

. Aspect-based sentiment analysis of movie reviews on discussion boards. Journal of Information Science 2010; 36(6): 823–848.

16.

Riloff

Patwardhan

Wiebe

. Feature subsumption for opinion analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2006, pp. 440–448.

17.

Zhai

Kang

Jia

. Exploiting effective features for Chinese sentiment classification. Expert Systems with Applications 2011; 38(8): 9139–9146.

18.

Sindhwani

Melville

. Document-word co-regularization for semi-supervised sentiment analysis. In: Proceedings of ICDM, 2008, pp. 1025–1030.

19.

Wang

Zhou

Lee

. Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of IJCAI, 2011, pp. 826–1831.

20.

Wan

. Bilingual co-training for sentiment classification of Chinese product reviews. Computational Linguistics 2011; 37(3): 587–616.

21.

Kennedy

Inkpen

. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence 2006; 22(2): 110–125.

22.

Zhang

Zeng

Wang

. Sentiment analysis of Chinese documents: From sentence to document level. Journal of the American Society for Information Science and Technology 2009; 60(12): 2474–2487.

23.

Wan

. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), 2008, pp. 553–561.

24.

Zhou

. Self-training from labelled features for sentiment analysis. Information Processing and Management 2011; 47(4): 606–616.

25.

Wiegand

Klakow

. Bootstrapping supervised machine-learning polarity classifiers with rule-based classification. In: Proceedings of 1st workshop on computational approaches to subjectivity and sentiment analysis (WASSA), 2010.

26.

Wang

Liu

. A cross-corpus study of subjectivity identification using unsupervised learning. Natural Language Engineering 2012; 18(3): 375–397.

27.

Wiebe

Riloff

. Creating subjective and objective sentence classifiers from unannotated texts. In: Proceedings of CICLing, 2005.

28.

Huang

Yang

Zhu

. Learning to identify review spam. In: Proceedings of the twenty-second international joint conference on artificial intelligence (IJCAI), 2011.

29.

Sun

. Experimental study on sentiment classification of Chinese review using machine learning techniques. In: Proceedings of IEEE International Conference on NLPKE, 2007.