تشخیص سرقت ادبی ذاتی
ترجمه نشده

تشخیص سرقت ادبی ذاتی

عنوان فارسی مقاله: یک رویکرد یکپارچه برای تشخیص سرقت ادبی ذاتی
عنوان انگلیسی مقاله: An integrated approach for intrinsic plagiarism detection
مجله/کنفرانس: سیستم های کامپیوتری نسل آینده-Future Generation Computer Systems
رشته های تحصیلی مرتبط: مهندسی کامپیوتر
گرایش های تحصیلی مرتبط: امنیت اطلاعات
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.future.2017.11.023
دانشگاه: Faculty of Engineering, Environment & Computing, School of Computing, Electronics and Maths, Coventry University, United Kingdom
صفحات مقاله انگلیسی: 26
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 7.007 در سال 2018
شاخص H_index: 93 در سال 2019
شاخص SJR: 0.835 در سال 2018
شناسه ISSN: 0167-739X
شاخص Quartile (چارک): Q1 در سال 2018
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E12093
فهرست مطالب (انگلیسی)

Abstract

1. Introduction

2. Previous work

3. Background

4. Proposed approach

5. Evaluation of the results

6. Conclusion

References

بخشی از مقاله (انگلیسی)

Abstract

Employing effective plagiarism detection methods are seen to be essential in the next generation web. In this paper, we present a novel approach for plagiarism detection without reference collections. The proposed approach relies on using some statistical properties of the most common words, and the Latent Semantic Analysis that is applied to extract the most common words usage patterns. This method aims to generate a model of author’s “style” by revealing a set of certain features of authorship. The model generation procedure focuses on just one author, as an attempt to summarise the aspects of an author’s style in a definitive and clear-cut manner. The feature set of the intrinsic model were based on the frequency of the most common words, their relative frequencies in the book series, and the deviation of these frequencies across all books for a particular author. The approach has been evaluated using the leave-one-out-cross-validation method on the CEN (Corpus of English Novel) data set. Results have indicated that, by integrating deep latent semantic and stylometric analyses, hidden changes can be identified when a reference collection does not exist. The results have also shown that our Multi-Layer Perceptron based approach statistically outperforms Bayesian Network, Support Vector Machine and Random Forest models, by accurately predicting the author classes with an overall accuracy of 97%.