Building Normalized SentiMI to enhance semi-supervised sentiment analysis

Abstract

Sentiment analysis and polarity detection is a type of text classification where natural language opinion is analyzed in order to classify it into either positive or negative categories. Classification of text into sentiment labels is a very difficult task as opinions expressed in natural language may contain abbreviations, slangs, sarcasm, irony and/or idioms. The proposed research focuses on the use of SentiWordNet3.0 as a labeled corpus for training purposes. We present a complete framework based on a dictionary named Normalized SentiMI (nSentiMI) which is created by calculating point-wise mutual information for each term/part-of-speech pair extracted from SentiWordNet. The proposed framework is applied on a dataset of 50,000 movie reviews to identify the value of a weight factor α and then evaluated on an unseen test dataset of 2000 movie reviews. Comparison with state of art techniques also confirms the superiority of proposed approach.

Keywords

SentiWordNet mutual information sentiment analysis social media text mining movie reviews

Get full access to this article

View all access options for this article.

References

Ohana

Tierney

2009

Sentiment classification of reviews using SentiWordNet

9th IT&T Conference, Dublin Institute of Technology Dublin, Ireland

Kennedy

Inkpen

2006

Sentiment classification of movie reviews using contextual valence shifters

Computational Intelligence 22 110 125

Pang

Lee

2008

Opinion mining and sentiment analysis

Foundations and Trends in Information Retrieval 2 1-2 1 135

Esuli

Sebastiani

2006

Sentiwordnet: A publicly available lexical resource for opinion mining

5th Conference on Language Resources and Evaluation

Paltoglou

2014

Sentiment analysis on social media. Online collaborative action

Springer

Zhang

Law

2009

Sentiment classification of online reviews to travel destinations by supervised machine learning approachesm

Expert Systems with Applications 36 3 Part 2 6527 6535

Elsevier

Han

Kamber

2006

Data Mining: Concepts and Techniques

Second edition

Morgan kaufmann

Yang

Pedersen

1997

A comparative study on feature selection in text categorization

14th ICML

Kazemzadeh

Lee

Narayanan

2013

Fuzzy logic models for the meaning of emotion words

IEEE Computational Intelligence Magazine 8 2 34 49

10.

Wilson

Wiebe

Hoffmann

2009

Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis

Computational Linguistics 35 3 399 403

11.

Zhang

Zeng

Wang

Zuo

2009

Sentiment analysis of chinese document: From sentence to document level

Journal of the Association for Information Science and Technology 60 12 2474 2487

12.

Turney

2002

Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews

40th Annual Meeting of ACL 417 424

Philadelphia

13.

Read

2005

Using emoticons to reduce dependency in machine learning techniques for sentiment classification

ACL Student Research Workshop 43 48

14.

Keefe

Koprinska

2009

Feature Selection and Weighting Methods in Sentiment Analysis

14th Australian Document Computing Symposium

15.

Verma

Bhattacharyya

2008

Incorporating Semantic Knowledge for Sentiment Analysis. India

6th International Conference on Natural Language Processing

16.

Reyes

Rosso

2014

On the difficulty of automatically detecting irony: Beyond a simple case of negation

Knowledge and Information Systems 40 3 595 614

17.

Barnden

Reyes

Shutova

Rosso

Veale

2015

Sentiment analysis of figurative language in twitter. SemEval-task

18.

Blitzer

Dredze

Pereira

2007

Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. 45th ACL

187 205

19.

Dredze

Crammer

Pereira

2008

Confidence-weighted linear classification

25th International Conference on Machine Learning ACM 264 271

20.

Pang

Lee

Vaithyanathan

2002

Thumbs up?: Sentiment classification using machine learning techniques

ACL-02 Conference on Empirical Methods in Natural Language Processing 10 79 86

21.

Abbasi

France

Zhang

Chen

2011

Selecting attributes for sentiment classification using feature relation networks

IEEE Transactions on Knowledge and Data Engineering 23 3 447 462

22.

Wiegand

Balahur

2010

A Survey on the Role of Negation in Sentiment Analysis

Workshop on Negation and Speculation in Natural Language Processing

23.

Jivani

2011

A comparative study of Stemming algorithms

International Journal of Computer Technology and Applications 2 6 1930 1938

24.

Jain

Pandey

2013

Analysis and implementation of sentiment classification using lexical POS markers

International Journal of Computing, Comm and Networking 2 1 36 40

25.

Lin

Zhang

Wang

Zhou

2012

An information theoretic approach to sentiment polarity classification

2nd Joint WICOW/AIRWeb Workshop on Web Quality 5 40 ACM

26.

Saggion

Funk

2010

Interpreting SentiWordNet for opinion classification

7th LREC 1129 1133

27.

Larsen

Marx

2000

An Introduction to Mathematical Statistics and Its Applications

Third Edition ISBN 0-13-922303-7 282

28.

Carroll

2002

Statistics Made Simple for School Leaders

Rowman & Littlefield

Retrieved 7 June 2009

29.

Yang

Bhattacharya

Srinivasan

2012

Lexical and Machine Learning Approaches Toward Online Reputation Management

CLEF (Online Working Notes/Labs/Workshop)

30.

Tan

Zhang

2008

An empirical study of sentiment analysis for chinese documents

Expert Systems with Applications 34 4 2622 2629

31.

Sharma

Dey

2012

Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis

Special Issue of International Journal of Computer Applications (0975 –8887)–ACCTHPCA 15 20

32.

Socher

Pennington

Huang

Manning

2011

Semi-supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP’11

151 161

33.

Zhou

2011

Self-Training from Labeled Features for Sentiment Analysis

Information processing and management

34.

Verma

Bhattacharyya

2008

Incorporating Semantic Knowledge for Sentiment Analysis. India

Proceedings of ICON-2008:6th International Conference on Natural Language Processing