Abstract
This paper proposes a method for learning ontologies given a corpus of text documents. The method identifies concepts in documents and organizes them into a subsumption hierarchy, without presupposing the existence of a seed ontology. The method uncovers latent topics for generating document text. The discovered topics form the concepts of the new ontology. Concept discovery is done in a language neutral way, using probabilistic space reduction techniques over the original term space of the corpus. Furthermore, the proposed method constructs a subsumption hierarchy of the concepts by performing conditional independence tests among pairs of latent topics, given a third one. The paper provides experimental results on the Genia and the Lonely Planet corpora from the domains of molecular biology and tourism respectively.
Keywords
Get full access to this article
View all access options for this article.
