Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis

Abstract

Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.

Keywords

Arabic Sentiment Corpus Arabic sentiment lexicon feature set senti-lexicon sentiment analysis

Get full access to this article

View all access options for this article.

References

Waltinger

. GermanPolarityClues: a lexical resource for German sentiment analysis. In: 7th international conference on language resources and evaluation (LREC), Valletta, 17–23 May 2010. Valletta: European Language Resources Distribution Agency.

Chen

Zimbra

. AI and opinion mining. IEEE Intell Syst 2010; 25: 74–80.

Pang

Lee

. Opinion mining and sentiment analysis. Found Trend Inform Retrieval 2008; 2(1–2): 1–35.

Cambria

Hussain

. Sentic computing: techniques, tools, and applications. Dordrecht: Springer Science & Business Media, 2012.

Pang

Lee

Vaithyanathan

. Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, volume 10, Philadelphia, PA, 6 July 2002, pp. 79–86. Stroudsburg, PA: Association for Computational Linguistics.

Chen

Lee

. User-centered sentiment analysis on customer product review. World Appl Sci J 2011; 12: 32–38.

Omar

Albared

Al-Shabi

. Ensemble of classification algorithms for subjectivity and sentiment analysis of Arabic customers’ reviews. Int J Adv Comput Tech 2013; 14: 77–85.

Tang

Tan

Cheng

. A survey on sentiment detection of reviews. Expert Syst Appl 2009; 36: 10760–10773.

Yang

Xiangji

Aijun

. A sentiment-aware model for predicting sales performance using blogs. In: SIGIR ’07 Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, 23–27 July 2007, pp. 607–615. New York: ACM.

10.

Singhal

Agrawal

Mittal

. Modeling Indian general elections: sentiment analysis of political Twitter data. Infor Syst Des Int App 2015; 339: 469–477.

11.

Ceron

Curini

Iacus

. Using sentiment analysis to monitor electoral campaigns method matters – evidence from the United States and Italy. Soc Sci Comp Rev 2015; 33: 3–20.

12.

Kontopoulos

Berberidis

Dergiades

. Ontology-based sentiment analysis of Twitter posts. Expert Syst Appl 2013; 40: 4065–4074.

13.

Kang

Yoo

Han

. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst Appl 2012; 39: 6000–6010.

14.

Prabowo

Thelwall

. Sentiment analysis: a combined approach. J Informetr 2009; 3: 143–157.

15.

Gao

Hao

. The application and comparison of web services for sentiment analysis in tourism. In: 12th international conference on service systems and service management (ICSSSM), Guangzhou, China, 22–24 June 2015, pp. 1–6. New York: IEEE.

16.

Omar

Albared

Al-Moslmi

. A comparative study of feature selection and machine learning algorithms for Arabic sentiment classification. Infor Retrieval Technol 2014; 8870: 429–443.

17.

Thet

J-C

Khoo

. Aspect-based sentiment analysis of movie reviews on discussion boards. J Inform Sci 2010; 36: 823–848.

18.

Azmi

Alzanin

. Aara’– a system for mining the polarity of Saudi public opinion through e-newspaper comments. J Inform Sci 2014; 40: 398–410.

19.

Arapakis

Lalmas

Cambazoglu

. User engagement in online news: under the scope of sentiment, interest, affect, and gaze. J Assoc Inform Sci Tech 2014; 65: 1988–2005.

20.

Basiri

Ghasem-Aghaee

Naghsh-Nilchi

. Exploiting reviewers’ comment histories for sentiment analysis. J Inform Sci 2014; 40: 313–328.

21.

Kim

EH-J

Jeong

Kim

. Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news. J Inform Sci. Epub ahead of print 5 October 2015. DOI: 10.1177/0165551515608733.

22.

Onan

Korukoğlu

. A feature selection model based on genetic rank aggregation for text sentiment classification. J Inform Sci. Epub ahead of print 5 November 2015. DOI: 10.1177/0165551515613226.

23.

Vilares

Alonso

Gómez-Rodríguez

. On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J Assoc Info Sci Tech 2015; 66: 1799–1816.

24.

Brooke

. A semantic approach to automated text sentiment analysis. Burnaby, BC, Canada: Simon Fraser University, 2009.

25.

Lü

H-t

Zhuo

S-j

. Sentiment analysis for Chinese text based on emotion degree lexicon and cognitive theories. J Shanghai Jiaotong Univ 2015; 20: 1–6.

26.

Zhang

. Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classification. J Info Sci 2015; 41: 531–549.

27.

Feldman

. Techniques and applications for sentiment analysis. Commun ACM 2013; 56: 82–89.

28.

Al-Moslmi

TAA

. Machine learning and lexicon-based approach for Arabic sentiment analysis. Malaysia: Universiti Kebangsaan Malaysia, 2014.

29.

Liu

. Sentiment analysis and opinion mining. Synth Lect Hum Lang Tech 2012; 5: 1–167.

30.

Ding

Liu

. A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 international conference on web search and data mining, Palo Alto, CA, 11–12 February 2008, pp. 231–240. New York: ACM.

31.

Taboada

Brooke

Tofiloski

. Lexicon-based methods for sentiment analysis. Comput Linguist 2011; 37: 267–307.

32.

Turney

Littman

. Measuring praise and criticism: inference of semantic orientation from association. ACM T Inform Syst 2003; 21: 315–346.

33.

Abdul-Mageed

Diab

. SANA: A large scale multi-genre, multi-dialect lexicon for Arabic subjectivity and sentiment analysis. In: LREC 2014, Reykjavik, Iceland, 26–31 May 2014, pp. 1162–1169.

34.

Esuli

Sebastiani

. Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation (LREC’06), Genoa, Italy: Citeseer, 24–26 May 2006, pp. 417–422.

35.

L-W

Liang

Y-T

Chen

H-H

. Tagging heterogeneous evaluation corpora for opinionated tasks. In: Conference on language resources and evaluation (LREC), Genoa, Italy, 24–26 May 2006.

36.

Neviarouskaya

Prendinger

Ishizuka

. Analysis of affect expressed through the evolving language of online communication. In: Proceedings of the 12th international conference on intelligent user interfaces, Honolulu, HI, 28–31 January 2007, pp. 278–281. New York: ACM.

37.

Wilson

Wiebe

Hoffmann

. Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, Vancouver, BC, Canada, 6–8 October 2005, pp. 347–354. Stroudsburg, PA: Association for Computational Linguistics.

38.

Lek

Poo

. Sentix: An aspect and domain sensitive sentiment lexicon. In: 2012 IEEE 24th international conference on tools with artificial intelligence (ICTAI), Athens, 7–9 November 2012, pp. 261–268. New York: IEEE.

39.

Clematide

Klenner

. Evaluation and extension of a polarity lexicon for German. In: Proceedings of the first workshop on computational approaches to subjectivity and sentiment analysis, Lisbon, Portugal, 17 August 2010, pp. 7–13, http://s3.amazonaws.com/academia.edu.documents/30850677/WASSA2010_Proceedings_.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1483763791&Signature=z1cKVroPOJgI6tSXr%2FoLURxLp%2FM%3D&response-content-disposition=inline%3B%20filename%3DPrivate_State_in_Public_Media_Subjectivi.pdf#page=15

40.

De Smedt

Daelemans

. ‘Vreselijk mooi!’(terribly beautiful): a subjectivity lexicon for Dutch adjectives. In: Proceedings of the eighth international conference on language resources and evaluation, Istanbul, Turkey, 24–26 May 2012, pp. 3568–3572, https://www.researchgate.net/profile/Walter_Daelemans/publication/264557244_Vreselijk_mooiterribly_beautiful_A_Subjectivity_Lexicon_for_Dutch_Adjectives/links/55d7307208aeb38e8a859d58.pdf.

41.

De Albornoz

Plaza

Gervás

. SentiSense: an easily scalable concept-based affective lexicon for sentiment analysis. In: The 8th international conference on language resources and evaluation, Istanbul, Turkey, 23–25 May 2012, pp. 3562–3567, http://nil.fdi.ucm.es/sites/default/files/236_Paper.pdf.

42.

Remus

Quasthoff

Heyer

. SentiWS – a publicly available German-language resource for sentiment analysis. In: Proceedings of the 7th international language resources and evaluation (LREC’10), Valetta, Malta, 19–21 May 2010, pp. 1168–1171, http://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/490_Paper.pdf.

43.

Bakliwal

Arora

Varma

. Hindi subjective lexicon: a lexical resource for Hindi polarity classification. In: Proceedings of the eight international conference on language resources and evaluation (LREC), Istanbul, Turkey, 24–26 May 2012. Hyderabad, India: CiteSeerX, http://s3.amazonaws.com/academia.edu.documents/44138554/inproceedings.pdf.a92646aa66336f21.4c5245432731322d3637332e706466.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1483764185&Signature=gIlU77gYCZabmtTtJyXlzQOH97w%3D&response-content-disposition=inline%3B%20filename%3DHindi_Subjective_Lexicon_A_Lexical_Resou.pdf

44.

Joshi

Balamurali

Bhattacharyya

. A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th ICON, 2010, https://www.cse.iitb.ac.in/~adityaj/HindiSentiWordnet_AdityaJ.pdf

45.

Goeuriot

J-C

Min Kyaing

. Sentiment lexicons for health-related opinion mining. In: Proceedings of the 2nd ACM SIGHIT international health informatics symposium, Miami, FL, 28–30 January 2012, pp. 219–226. New York: ACM.

46.

Jijkoun

Hofmann

. Generating a non-English subjectivity lexicon: relations that matter. In: Proceedings of the 12th conference of the European chapter of the association for computational linguistics, Athens, Greece, 30 March–3 April 2009, pp. 398–405. Stroudsburg, PA: Association for Computational Linguistics.

47.

Mahyoub

Siddiqui

Dahab

. Building an Arabic Sentiment Lexicon using semi-supervised learning. J King Saud Univ Comput Info Sci 2014; 26: 417–424.

48.

Badaro

Baly

Hajj

. A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP 2014 workshop on Arabic natural language processing (ANLP), Doha, Qatar, 25 October 2014, pp. 165–173. Stroudsburg, PA: Association for Computational Linguistics.

49.

Al-Ayyoub

Essa

Alsmadi

. Lexicon-based sentiment analysis of Arabic tweets. Int J Soc Netw Min 2015; 2(2): 101–114.

50.

Dashtipour

Poria

Hussain

. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comp 2016; 8: 1–15.

51.

Chaovalit

Zhou

. Movie review mining: a comparison between supervised and unsupervised classification approaches. In: 2005 HICSS’05. Proceedings of the 38th annual Hawaii international conference on system sciences, Big Island, HI, 3–6 January 2005, p. 112c-c. New York: IEEE.

52.

Dehkharghani

Yanikoglu

Tapucu

Saygin

. Adaptation and use of subjectivity lexicons for domain dependent sentiment classification. In: 2012 IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium, 10 December 2012, pp. 669–673.

53.

Duwairi

Qarqaz

. Arabic sentiment analysis using supervised classification. In: 2014 international conference on future Internet of things and cloud (FiCloud), Barcelona, 27–29 August 2014, pp. 579–583. New York: IEEE.

54.

Jiménez-Zafra

Martín-Valdivia

Martínez-Cámara

. Combining resources to improve unsupervised sentiment analysis at aspect-level. J Inf Sci. Epub ahead of print 9 July 2015. DOI: 10.1177/0165551515593686.

55.

Wilson

Hoffmann

Somasundaran

. OpinionFinder: a system for subjectivity analysis. In Proceedings of HLT/EMNLP on interactive demonstrations, Vancouver, BC, Canada, 7 October 2005, pp. 34–35. Stroudsburg, PA: Association for Computational Linguistics, http://dl.acm.org/citation.cfm?id=1225751 and the open PDF URL is http://www.egr.msu.edu/~jchai/EMNLP05/demoabstracts/book.pdf#page=42.

56.

Mudinas

Zhang

Levene

. Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining, Beijing, China, 12 August 2012, p. 5. New York: ACM.

57.

Xia

Wang

Wong

K-F

. Sentiment vector space model for lyric-based song sentiment classification. Int J Comp Proc Lang 2008; 21(4): 309–330.

58.

Rushdi-Saleh

Martín-Valdivia

Ureña-López

. Opinion corpus for Arabic. J Am Soc Inf Sci Tec 2011; 62: 2045–2054.

59.

Abouenour

Bouzoubaa

Rosso

. Improving Q/A using Arabic WordNet. In: Proceedings of the 2008 international Arab conference on information technology (ACIT’2008), Hammamet, Tunisia, 16–18 December 2008.

60.

Duwairi

El-Orfali

. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci 2014; 40: 501–513.

61.

Xia

Zong

. Ensemble of feature sets and classification algorithms for sentiment classification. Inform Sciences 2011; 181: 1138–1152.

62.

Liu

. Sentiment analysis: a multi-faceted problem. IEEE Intell Syst 2010; 25: 76–80.

63.

Liu

. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, 22–25 August 2004, pp. 168–177. New York: ACM.

64.

Zhang

. Sentiment classification of Internet restaurant reviews written in Cantonese. Expert Syst Appl 2011; 38: 7674–7682.

65.

Gezici

Yanikoglu

Tapucu

. New features for sentiment analysis: do sentences matter. In: SDAD 2012 the 1st international workshop on sentiment discovery from affective data, Bristol, UK, 24–28 September 2012, p. 5, http://ceur-ws.org/Vol-917/SDAD2012.pdf?ref=driverlayer.com/web#page=5.

66.

Claster

Dinh

Cooper

. Naïve Bayes and unsupervised artificial neural nets for Cancun tourism social media data analysis. In: 2010 second world congress on nature and biologically inspired computing (NaBIC), Beppu, Japan, 15–17 December 2010, pp. 158–163. New York: IEEE.

67.

Moraes

Valiati

Gavião Neto

. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 2013; 40: 621–633.

68.

Chen

Liu

Chiu

. A neural network based approach for sentiment classification in the blogosphere. J Informetr 2011; 5: 313–322.

69.

Ghiassi

Skinner

Zimbra

. Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst Appl 2013; 40: 6266–6282.

70.

Tan

. A two-stage framework for cross-domain sentiment classification. Expert Syst Appl 2011; 38: 14269–14275.

71.

Tan

Zhang

. An empirical study of sentiment analysis for Chinese documents. Expert Syst Appl 2008; 34: 2622–2629.

72.

Cambria

Havasi

Hussain

. SenticNet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: Proceedings of the twenty-fifth international Florida artificial intelligence research society conference, Marco Island, FL, 23–25 May 2012, pp. 202–207. Palo Alto, CA: AAAI Press.

73.

Abdul-Mageed

Diab

. SANA: a large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In: LREC 2014, Reykjavik, Iceland, 26–31 May 2014, pp. 1162–1169.