Unsupervised event exploration from social text streams

Abstract

Social media provides unprecedented opportunities for people to disseminate information and share their opinions and views online. Extracting events from social media platforms such as Twitter could help in understanding what is being discussed. However, event extraction from social text streams poses huge challenges due to the noisy nature of social media posts and dynamic evolution of language. We propose a generic unsupervised framework for exploring events on Twitter which consists of four major steps, filtering, pre-processing, extraction and categorization, and post-processing. Tweets published in a certain time period are aggregated and noisy tweets which do not contain newsworthy events are filtered by the filtering step. The remaining tweets are pre-processed by temporal resolution, part-of-speech tagging and named entity recognition in order to identify the key elements of events. An unsupervised Bayesian model is proposed to automatically extract the structured representations of events in the form of quadruples $<$ entity, keyword, date, location $>$ and further categorize the extracted events into event types. Finally, the categorized events are assigned with the event type labels without human intervention. The proposed framework has been evaluated on over 60 million tweets which were collected for one month in December 2010. A precision of 78.01% is achieved for event extraction using our proposed Bayesian model, outperforming a competitive baseline by nearly 13.6%. Moreover, events are also clustered into coherence groups with the automatically assigned event type labels with an accuracy of 42.57%.

Keywords

Social media event extraction bayesian model unsupervised learning

Get full access to this article

View all access options for this article.

References

Abdelhaq

Sengstock

and Gertz

, Eventweet: Online localized event detection from twitter, Proceedings of the VLDB Endowment 6(12) (2013), 1326–1329.

Allan

, Topic Detection and Tracking: Event-based Information Organization, Kluwer Academic Publishers, Norwell, MA, USA, 2002.

Anantharam

et al., Extracting city traffic events from social streams, ACM Transactions on Intelligent Systems and Technology 6(4) (2015), 1–43.

Atefeh

and Khreich

, A survey of techniques for event detection in twitter, Computational Intelligence 31(1) (2015), 132–164.

Becker

Naaman

and Gravano

, Beyond trending topics: Real-world event identification on twitter, in: Proc of the Fifth International AAAI Conference on Weblogs and Social Media, 2011.

Benson

Haghighi

and Barzilay

, Event discovery in social media feeds, in: Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011, pp. 389–398.

Chang

A.X.

and Manning

C.D.

, Sutime: A library for recognizing and normalizing time expressions, in: Proc of the 8th International Conference on Language Resources and Evaluation, 2012.

Chun

H.-W.

et al., Building patterns for biomedical event extraction, in: Proc of the Fifteenth International Conference on Genome Informatics, 2004.

Gimpel

et al., Part-of-speech tagging for twitter: Annotation, features, and experiments, in: Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011.

10.

Griffiths

T.L.

and Steyvers

, Finding scientific topics, in: Proc of the National Academy of Sciences, Vol. 101 (Suppl. 1), 2004, pp. 5228–5235.

11.

Grishman

Westbrook

and Meyers

, NYU’s english ACE 2005 system description, in: Proc of ACE Evaluation Workshop, 2005.

12.

Hall

et al., The WEKA data mining software: An update, SIGKDD Explorations 11(1) (2009).

13.

Lee

and Sumiya

, Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection, in: Proc of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, 2010, pp. 1–10.

14.

et al., Major life event extraction from twitter based on congratulations/condolences speech acts, in: Proc of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1997–2007.

15.

Liu

et al., Exacting social events for tweets using a factor graph, in: Proc of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, pp. 1692–1698.

16.

Naughton

et al., Event extraction from heterogeneous news sources, in: Proc of the 2006 AAAI Workshop on Event Extractionand Synthesis, 2006, pp. 1–6.

17.

Panem

Gupta

and Varma

, Structured information extraction from natural disaster events on twitter, in: Proc of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning, 2014, pp. 1–8.

18.

Petrovic

et al., Can twitter replace newswire for breaking news? in: Proc of the 7th International AAAI Conference on Weblogs and Social Media, 2013.

19.

Piskorski

et al., Cluster-centric approach to news event extraction, in: Proc of the International Conference on New Trends in Multimedia and Network Information Systems, 2008, pp. 276–290.

20.

Popescu

A.-M.

et al., Extracting events and event descriptions from twitter, in: Proc of the 20th International Conference Companion on World Wide Web, 2011, pp. 105–106.

21.

Ritter

et al., Named entity recognition in tweets: an experimental study, in: Proc of the Conference on Empirical Methods in Natural Language Processing, 2011, pp. 1524–1534.

22.

Ritter

et al., Open domain event extraction from twitter, in: Proc of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 1104–1112.

23.

Sakaki

Okazaki

and Matsuo

, Earthquake shakes twitter users: real-time event detection by social sensors, in: Proc of the 19th International Conference on World Wide Web, 2010, pp. 851–860.

24.

Sankaranarayanan

et al., Twitterstand: News in tweets, in: Proc of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2009, pp. 42–51.

25.

Sriram

et al., Short text classification in twitter to improve information filtering, in: Proc of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010, pp. 841–842.

26.

Tanev

Piskorski

and Atkinson

, Real-time news event extraction for global crisis monitoring, in: Proc of the 13th International Conference on Applications of Natural Language to Information Systems, 2008, pp. 207–218.

27.

Zhao

et al., Unsupervised spatial event detection in targeted domains with applications to civil unrest modeling, PLoS ONE 9(10) (2014).

28.

Zhou

Chen

and He

, An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization, in: Proc of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, pp. 700–705.