A semi-hierarchical clustering method for constructing knowledge trees from stackoverflow

Abstract

To help students learn how to programme, we have to give them a clear knowledge map and sufficient materials. Question-based websites, such as stackoverflow, are excellent information sources for this goal. However, for beginners, the process can be a little tricky since they may not know how to ask correct questions if they do not have sufficient background knowledge, and a knowledge tree is usually considered more helpful in such a scenario. In this research, a method to infer a knowledge tree automatically from the type of websites and to group documents based on the resulting knowledge tree is proposed. The proposed method mainly addresses two issues: first, the quality of tags cannot be guaranteed, and second, clustering-based methods usually generate the flat schema. The occurrence count and the co-occurrence ratio were used together to identify important tags. Then, an algorithm was developed to infer the hierarchical relationship between tags. Using these tags as centres, the clustering performance is better than applying k-means alone.

Keywords

Clustering folksonomy knowledge tree tags

Get full access to this article

View all access options for this article.

References

Gemmell

Shepitsen

Mobasher

, et al. Personalizing navigation in folksonomies using hierarchical tag clustering. In: Song

I-Y

Eder

Nguyen

(eds) Data warehousing and knowledge discovery. Berlin; Heidelberg: Springer, 2008, pp. 196–205.

Heymann

Garcia-Molina

Collaborative creation of communal hierarchical taxonomies in social tagging systems. Stanford, CA: Stanford University Press, 2006.

Tan

Guo

E-learning recommendation system. In: International conference on computer science and software engineering, Hubei, China, 12–14 December 2008, pp. 430–433. New York: IEEE.

Fatima

Luca

Wilson

. User experience and efficiency for semantic search engine. In: International conference on optimization of electrical and electronic equipment (OPTIM), Bran, 22–24 May 2014, pp. 924–929. New York: IEEE.

Roy

Modak

Barik

, et al. An overview of semantic search engines. Int J Res Rev 2019; 6(10): 73–85.

Zeng

H-J

Q-C

Chen

, et al. Learning to cluster web search results. In: Proceedings of the 27th annual international conference on Research and development in information retrieval (SIGIR ’04). New York, 2004, p. 210. New York: ACM Press.

Ramage

Heymann

Manning

, et al. Clustering the tagged web. In: Baeza-Yates

Boldi

Ribeiro-Neto

, et al. (eds) Proceedings of the second ACM international conference on web search and data mining (WSDM ’09). New York: ACM Press, 2009, p. 54.

Shen

Folksonomy as a complex network, 2005, https://arxiv.org/abs/cs/0509072

Park

Exploiting the social tagging network for web clustering. IEEE Trans Syst, Man, Cybern A 2011; 41(5): 840–852.

10.

Inbarani

Kumar

SS.

Hybrid TRS-FA clustering approach for web2.0 social tagging system. Int J Rough Sets Data Anal 2015; 2(1): 70–87.

11.

Tang

Leung

Luo

, et al. Towards ontology learning from folksonomies. In: Twenty-First international joint conference on artificial intelligence, 2009, https://www.ijcai.org/Proceedings/09/Papers/344.pdf

12.

Lin

Davis

Zhou

An integrated approach to extracting ontological structures from folksonomies. In: Aroyo

Traverso

Ciravegna

, et al. (eds) The semantic web: research and applications. Berlin; Heidelberg: Springer, 2009, pp. 654–668.

13.

Karypis

MSG

Kumar

Steinbach

. A comparison of document clustering techniques. In: TextMining Workshop at KDD2000, May 2000, http://www.stat.cmu.edu/~rnugent/PCMI2016/papers/DocClusterComparison.pdf

14.

Bouras

Tsogkas

W-kmeans: clustering news articles using WordNet. In: Setchi

Jordanov

Howlett

, et al. (eds) Knowledge-based and intelligent information and engineering systems. Berlin; Heidelberg: Springer, 2010, pp. 379–388.

15.

Zafar

Cochez

Qamar

. Using distributional semantics for automatic taxonomy induction. In: International conference on frontiers of information technology (FIT), Islamabad, Pakistan, 19–21 December 2016, pp. 348–353. New York: IEEE.

16.

Zhu

Shen

Cai

, et al. Building a large-scale software programming taxonomy from stackoverflow. In: SEKE 2015, https://ksiresearch.org/seke/seke15paper/seke15paper_135.pdf

17.

Barua

Thomas

Hassan

AE.

What are developers talking about? An analysis of topics and trends in Stack Overflow. Empir Software Eng 2014; 19(3): 619–654.

18.

Zhitomirsky-Geffet

Daya

Mining query subtopics from social tags. Inform Res 2015; 20(2): 1–15.

19.

Zou

Yang

, et al. Towards comprehending the non-functional requirements through Developers eyes. Inf Softw Technol 2017; 84: 19–32.

20.

Joorabchi

English

Mahdi

AE.

Automatic mapping of user tags to Wikipedia concepts: the case of a Q&A website–stackoverflow. J Inform Sci 2015; 41(5): 570–583.

21.

Kodinariya

Makwana

PR.

Review on determining number of cluster in K-means clustering. Int J Adv Res Comput Sci Manage Stud 2013; 1(6): 90–95.