Linguistic summaries of graph datasets using ontologies: An application to Semantic Web

Abstract

An approach to performing linguistic summaries of graph datasets, with particular focus on usage of ontologies is presented in this paper. This well-known mining technique is based on fuzzy set theory, which is used to model natural language words (e.g. ‘many’, ‘tall’), and in result - generates natural-like sentences describing the data. Although intensely developed, before our work this method has been applied only to relational databases, while more and more data is available in graph model. A special case of such graph datasets is the Semantic Web, in which ontologies provide meaning, therefore enabling advanced machine learning. In our paper we analyze the problem of generating linguistic summaries for a graph data case (for which the method cannot be directly applied), with associated ontologies. The key element of ontologies are concept hierarchies, which are the core of our work. Firstly, due to heterogeneity and lack of schema we propose to use an ontological concept (including all sub-concepts in hierarchy) as a subject for summaries, and extract their attributes (neighboring vertexes). Then we show that by ascending these ontological concept hierarchies (so by attribute-based induction) we obtain additional, generalized summaries. We show this process for both summarizers and qualifiers, and propose an extension to their respective imprecision measures - T₂ and T₉. We perform two experiments on DBPedia - one for summary subject ‘Artist’, and second for ‘Musical Album’. For the latter, we show the optimized process of obtaining the truth values using bottom-up approach.

Keywords

Linguistic summaries fuzzy logic ontology Semantic Web

Get full access to this article

View all access options for this article.

References

Yager

R.R.

, A new approach to the summarization of data, Inf Sci28(1) (1982), 69–86.

Yager

R.R.

, Linguistic summaries as a tool for database discovery. In pp, In FQAS, 1994, pp. 17–22.

Yager

R.R.

, Ford

K.M.

and Canas

A.J.

, An approach to the linguistic summarization of data. In Bernadette Bouchon-Meunier, Yager

Ronald R.

and Zadeh

Lotfi A.

, editors, IPMU, volume 521 of Lecture Notes in Computer Science, Springer, 1990, pp. 456–468.

Yager

R.R.

, On linguistic summaries of data, In Knowledge Discovery in Databases, 1991, pp. 347–366.

Kuramochi

and Karypis

, Frequent subgraph discovery. In Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM ‘01, Washington, DC, USA, IEEE Computer Society, 2001, pp. 313–320.

Yan

and Han

, gspan: Graph-based substructure pattern mining. iN Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM ’02, Washington, DC, USA, IEEE Computer Society, 2002, p. 721.

Srivastava

, Cooley

, Deshpande

and Tan

P.-N.

, Web usage mining: Discovery and applications of usage patterns from web data, SIGKDD Explor Newsl1(2), 12–23.

Kosala

and Blockeel

, Web mining research: A survey, SIGKDD Explor Newsl2(1) (2000), 1–15.

Castelltort

, Laurent

, Information Processing and Management of Uncertainty in Knowledge-Based Systems: 15th International Conference, IPMU 2014, Montpellier, France, July 15-19, 2014, Proceedings, Part III, chapter Fuzzy Queries over NoSQL Graph Databases: Perspectives for Extending the Cypher Language, Springer International Publishing, Cham, 2014, pp. 384–395.

10.

Castelltort

and Laurent

, Extracting fuzzy summaries from nosql graph databases. In Flexible Query Answering Systems 2015 - Proceedings of the 11th International Conference FQAS 2015, Cracow, Poland, 2015, pp. 189–200.

11.

Strobin

, Niewiadomski

, Computational Collective Intelligence: 7th International Conference, ICCCI 2015, Madrid, Spain, September 21-23, 2015, Proceedings, Part I, chapter Linguistic Summaries of Graph Datasets Using Ontologies: An Application to SemanticWeb, Springer International Publishing, Cham, 2015, pp. 380–389.

12.

Han

, Yongjian

, Advances in knowledge discovery and data mining. chapter Attribute-oriented Induction in Data Mining, American Association for Artificial Intelligence, Menlo Park, CA, USA, 1996, pp. 399–421.

13.

Han

, Fu

, Huang

, Cai

and Cercone

, DBLearn: A system prototype for knowledge discovery in relational databases, SIGMOD Record (ACM Special Interest Group on Management of Data)23(2) (1994), 516.

14.

Lee

D.H.

and Kim

M.H.

, Database summarization using fuzzy isa hierarchies, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on27(1) (1997), 68–78.

15.

Sassi

, Grissa Touzi

and Ounelli

, A fuzzy linguistic approach to database summarization, In Fuzzy Systems, 2008 FUZZ-IEEE 2008 (IEEEWorld Congress on Computational Intelligence) IEEE International Conference on, 2008, pp. 771–778.

16.

Saint-Paul

, Raschia

and Mouaddib

, Database summarization: The saintetiq system, In 2007 IEEE 23rd International Conference on Data Engineering, 2007, pp. 1475–1476.

17.

Yager

R.R.

and Petry

F.E.

, A multicriteria approach to data summarization using concept ontologies, IEEE Transactions on Fuzzy Systems14(6) (2006), 767–780.

18.

W3c Reference - Using OWL and SKOS, https://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html

19.

Ramakrishnan

, Aleman-Meza

, Halaschek-Wiener

, Sheth

and Arpinar

I.B.

, Ranking complex relationships on the semantic web, IEEE Internet Computing (2005), 37–44.

20.

Kacprzyk

, Wilbik

and Zadrozny

, Linguistic summaries of time series via a quantifier based aggregation using the sugeno integral. In Fuzzy Systems, 2006 IEEE International Conference on, 2006, pp. 713–719.

21.

Lehmann

, et al., DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia, Semantic Web Journal (2014).

22.

Candan

K.S.

, Liu

and Suvarna

, Resource description framework: Metadata and its applications. SIGKDD Strobin, Niewiadomski/Linguistic summaries of graph datasets with ontologies 11 Explor, Newsl3(1) (2001), 6–19.

23.

Cingolani

and Alcala-Fdez

, jfuzzylogic: A robust and flexible fuzzy-logic inference system language implementation. In Fuzzy Systems (FUZZ-IEEE), 2012 IEEE International Conference on, 2012, pp. 1–8. DOI:10.1109/FUZZ-IEEE.2012.6251215