Abstract
Plagiarism is common in English writing exams. Researchers classify plagiarism into copy-paste and text-rewriting plagiarism, but existing models need help with problems such as the single way of checking and unsatisfactory results. Aiming at the copy-paste problem in English writing plagiarism, this paper proposes a digital fingerprint model based on the N-Gram window jumping mechanism. The model incorporates a sliding window and an improved matching tool to solve the problems of excessive fingerprint density and low checking efficiency in text extraction. Meanwhile, the model adds a Fisher-Yates shuffle algorithm with a salt parameter to crack the hash collision in text matching. The experiments show that the model can detect copy-paste plagiarism in English composition. For text rewriting plagiarism, this paper designs a TextCNN-BiGRU-based model, which combines TextCNN and BiGRU so that the extracted text semantic information considers the text’s local and contextual features. The experiments show that the model improves the accuracy by 1.9 percentage points and the F1 value by 1.2 percentage points on the MRPC dataset compared with other models.
Get full access to this article
View all access options for this article.
