Topical tags vs non-topical tags: Towards a bipartite classification?

Abstract

In this paper we investigate whether it is possible to create a computational approach that allows us to distinguish topical tags (i.e. talking about the topic of a resource) and non-topical tags (i.e. describing aspects of a resource that are not related to its topic) in folksonomies, in a way that correlates with humans. Towards this goal, we collected 21 million tags (1.2 million unique terms) from Delicious and developed an unsupervised statistical algorithm that classifies such tags by applying a word space model adapted to the folksonomy space. Our algorithm analyses the co-occurrence network of tags to a target tag and exploits graph-based metrics for their classification. We validated its outcomes against a reference classification made by humans on a limited number of terms in three separate tests. The analysis of the outcomes of our algorithm shows, in some cases, a consistent disagreement among humans and between humans and our algorithm about what constitutes a topical tag, and suggests the rise of a new category of overly generic tags (i.e. umbrella tags).

Keywords

Delicious folksonomy latent semantic analysis topicality and non-topicality of tags umbrella tags user testing session

Get full access to this article

View all access options for this article.

References

Vander

Wal T

. Folksonomy coinage and definition, http://vanderwal.net/folksonomy.html (2007, accessed 20 March 2015).

Bates

Rowley

. Social reproduction and exclusion in subject indexing: A comparison of public library OPACs and LibraryThing folksonomy. Journal of Documentation2011; 67(3): 431–448; doi: 10.1108/00220411111124532

Guinard

Trifa

. Towards the Web of things: Web mashups for embedded devices. In: Proceedings of the 2nd workshop on mashups, enterprise mashups and lightweight composition on the Web (MEM 2009), http://integror.net/mem2009/papers/paper4.pdf (2009, accessed 20 March 2015).

Hendler

Shadbolt

Hall

Berners-Lee

Weitzner

. Web science: An interdisciplinary approach to understanding the Web. Communications of the ACM2008; 51(7): 60–69; doi: 10.1145/1364782.1364798

De Meo

Nocera

Terracina

Ursino

. Recommendation of similar users, resources and social networks in a social internetworking scenario. Information Sciences2011; 181(7): 1285–1305; doi: 10.1016/j.ins.2010.12.001

Bellogín

Cantador

Castells

. A comparative study of heterogeneous item recommendations in social systems. Information Sciences2013; 221: 142–169; doi: 10.1016/j.ins.2012.09.039

Kipp

MEI

. @toread and cool: Subjective, affective and associative factors in tagging. In: Proceedings of the 36th annual conference of the Canadian Association for Information Science (CAIS 2008), http://www.cais-acsi.ca/proceedings/2008/kipp_2008.pdf (2008, accessed 20 March 2015).

Catarino

Baptista

. Relating folksonomies with Dublin Core. In: Proceedings of the 2008 DCMI international conference on Dublin Core and metadata applications (DC 2008), http://dcpapers.dublincore.org/pubs/article/view/915/911 (2008, accessed 20 March 2015).

Golder

Huberman

. Usage patterns of collaborative tagging systems. Journal of Information Science2006; 32(2): 198–208; doi: 10.1177/0165551506062337

10.

Mao

. Towards the Semantic Web: Collaborative tag suggestions. In: Proceedings of the 1st collaborative Web tagging workshop, http://semanticmetadata.net/hosted/taggingws-www2006-files/13.pdf (2006, accessed 20 March 2015).

11.

Strohmaier

Korner

Kern

. Understanding why users tag: A survey of tagging motivation literature and results from an empirical study. Web Semantics: Science, Services and Agents on the World Wide Web2012; 17: 1–11; doi: 10.1016/j.websem.2012.09.003

12.

Baptista

Tonkin

Resmini

Van Hooland

Pinheiro

Mendez

. Kinds of tags – Progress report for the DC-Social Tagging Community, podcast presentation at the 2007 International Conference on Dublin Core and Metadata Applications (DC 2007), http://hdl.handle.net/1822/6881 (2007, accessed 20 March 2015).

13.

Dublin Core Metadata Initiative. Dublin Core Metadata Element Set, Version 1.1, DCMI Recommendation, 14 June 2012, http://dublincore.org/documents/dces/ (2012, accessed 20 March 2015).

14.

Sen

Lam

Rashid

Cosley

Frankowski

Osterhouse

Harper

Riedl

. Tagging, communities, vocabulary, evolution. In: Proceedings of the 2006 ACM conference on computer supported cooperative work (CSCW 2006), 1st edn. New York: ACM, 2006, pp. 181–190; doi: 10.1145/1180875.1180904

15.

Gupta

Yin

Han

. Survey on social tagging techniques. ACM SIGKDD Explorations Newsletter 2010; 12(1): 58–72; doi: 10.1145/1882471.1882480

16.

Hutchins

. On the problem of ‘aboutness’ in document analysis. Journal of Informatics 1977; 1(1): 17–35.

17.

Yang

Sun

Zhang

Mei

. We know what @you #tag: Does the dual role affect hashtag adoption? In: Proceedings of the 21st international conference on World Wide Web (WWW 2012), 1st edn. New York: ACM, 2012, pp. 261–270; doi: 10.1145/2187836.2187872

18.

Mika

. Ontologies are us: A unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web2006; 5: 5–15; doi: 10.1016/j.websem.2006.11.002

19.

Specia

Motta

. Integrating folksonomies with the Semantic Web. In: Franconi

Kifer

May

(eds) Proceedings of the 4th European Semantic Web conference (ESWC 2007), 1st edn. Berlin: Springer, 2007, pp. 624–639; doi: 10.1007/978–3–540–72667–8_44

20.

Dublin Core Metadata Initiative DCMI Metadata Terms. DCMI Recommendation, 14 June 2012, http://dublincore.org/documents/dcmi-terms/ (2012, accessed 20 March 2015).

21.

Chan

. Linking folksonomy to Library of Congress subject headings: An exploratory study. Journal of Documentation2009; 65(6): 872–900; doi: 10.1108/00220410910998906

22.

Guarino

Welty

. Evaluating ontological decisions with OntoClean. Communications of the ACM 2002; 45(2): 61–65; doi: 10.1145/503124.503150

23.

Palma

Zablith

Haase

Corcho

. Ontology evolution. In: Suárez-Figueroa

Gómez-Pérez

Motta

Gangemi

(eds) Ontology Engineering in a Networked World, 1st edn. Berlin: Springer, 2012, pp. 235–255; doi: 10.1007/978–3–642–24794–1_11

24.

Zablith

Sabou

d’Aquin

Motta

. Ontology evolution with Evolva. In: Aroyo

. (eds) Proceedings of the 6th European Semantic Web conference (ESWC 2009), 1st edn. Berlin: Springer, 2009, pp. 908–912; doi: 10.1007/978–3–642–02121–3_80

25.

Deerwester

Dumais

Furnas

Landauer

Harshman

. Indexing by latent semantic analysis. Journal of the American Society for Information Science1990; 41(6): 391–407; doi: 10.1002/(SICI)1097–4571(199009)41:6<391::AID-ASI1>3.0.CO;2–9

26.

Burgess

Lund

. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes1997; XII: 1–3; doi: 10.1080/016909697386844

27.

Blei

Jordan

. Latent Dirichlet allocation. Journal of Machine Learning Research2003; 3(4–5): 993–1022; doi: 10.1162/jmlr.2003.3.4–5.993

28.

Turney

Pantel

. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research2010; 37: 141–188; doi: 10.1613/jair.2934

29.

Widdows

. Geometry and meaning, 1st edn. Stanford, CA: CSLI, 2004.

30.

Heyer

Lauter

Quasthoff

Wittig

Wolff

. Learning relations using collocations. In: Proceedings of the 2nd workshop on ontology learning (OL 2001), http://ceur-ws.org/Vol-38/IJCAI_2001_WS_Ontologies_Heyer_etal.pdf (2001, accessed 20 March 2015).

31.

Watts

Strogatz

. Collective dynamics of ‘small-world’ networks. Nature1998; 393: 440–442; doi: 10.1038/30918

32.

Dorow

Widdows

Ling

Eckmann

Sergi

Moses

. Using curvature and Markov clustering in graphs for lexical acquisition and word sense discrimination, http://arxiv.org/pdf/condmat/0403693.pdf (2004, accessed 20 March 2014).

33.

Miller

. WordNet: A lexical database for English. Communications of the ACM1995; 38(11): 39–41; doi: 10.1145/219717.219748

34.

Halpin

Robu

Shepherd

. The complex dynamics of collaborative tagging. In: Proceedings of 16th international World Wide Web Conference (WWW 2007). New York: ACM, 2007, pp. 211–220; doi: 10.1145/1242572.1242602

35.

Monclar

Tecla

Oliveira

de Souza

. MEK: Using spatial–temporal information to improve social networks and knowledge dissemination. Information Sciences2009; 179(15): 2524–2537; doi: 10.1016/j.ins.2009.01.032

36.

Artstein

Poesio

. Inter-coder agreement for computational linguistics. Computational Linguistics2008; 34(4): 555–596; doi: 10.1162/coli.07–034-R2

37.

De Meo

Quattrone

Ursino

. Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Information Systems2009; 34(6): 511–535; doi: 10.1016/j.is.2009.02.004

38.

Beg

MMS

Ahmad

. Web search enhancement by mining user actions. Information Sciences2006; 177(23): 5203–5218; doi: 10.1016/j.ins.2006.06.011