Abstract
Detecting similar question is a fundamental and essential research problem for constructing similar question dataset for the research of question-answering, short text similarity calculating, and sentence paragraphing. This paper explores the previous assumption about similar question detection and analyzes its existing problem. Afterwards, we propose an automated approach to detecting similar questions based on the calculation of question topical diversity using different ways of topical feature generation methods. The experiment dataset are Yahoo! 4,482,757 questions with answers. The results present that our approach achieves a precision of 74% and a recall of 74% as the best performance compared with baseline methods, demonstrating its effectiveness in similar question group detection.
Get full access to this article
View all access options for this article.
