SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Abstract

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D₁ and D₂ based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D₁ and D₂. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.

Keywords

Plagiarism word-correlation factor sentence similarity word manipulation graphical view

Get full access to this article

View all access options for this article.