Abstract
In this paper we present various methods of estimating the K-number, the number of distinct entities carrying the same name in a corpus and an analysis of their characteristics and their impact on person cross document coreference task (PCDC). There are two important classes of such methods, corpus based and external resource based. The experiments reported here show that the estimation of K-number plays an important role for PCDC, from understanding the complexity of the task to improving the overall accuracy of coreference.
Keywords
Get full access to this article
View all access options for this article.
