Abstract
Rough Sets provide a mathematical tool to handle decision making under uncertainty. One major domain that can be characterized with inherent ambiguity is natural language texts which often leads to uncertainty in understanding the intent and relative importance of a sentence with respect to its context in the whole text. As a consequence, the process of sentence selection for generation of extractive summary can logically be considered as a process of decision making under uncertainty. In this paper we use rough set based techniques to deal with this uncertainty. This paper’s contribution is two-fold. Firstly, this paper proposes a novel Rough Set based uncertainty measure called span and define special Rough subsets of universe called spanning sets. Span is Rough Set based measure for salience of a subset of universe and spanning set is the subset that maximizes the span. This corresponds to the key elements representing a problem and can be used to solve various real-life applications. Secondly, the concepts are applied to determine extracts of text documents. The idea behind the present work is to determine the most suitable subset(s) of the universe of sentences under consideration. An optimization problem is formulated to generate the extract for the text under consideration using the proposed uncertainty measure of span and is solved using Particle Swarm Optimization. The experimental results on DUC2001, DUC2002 single document data sets and Enron Email datasets establish the effectiveness of the proposed technique. There has been substantial work on Rough Sets though considering a stochastic Rough-subset of the universe and determining its aptness as a representative of the universe is still unexplored. The proposed technique is a novel effort to fill this gap.
Keywords
Get full access to this article
View all access options for this article.
