شناسایی تالیف کد با استفاده از شبکه های عصبی پیچشی
ترجمه نشده

شناسایی تالیف کد با استفاده از شبکه های عصبی پیچشی

عنوان فارسی مقاله: شناسایی تالیف کد با استفاده از شبکه های عصبی پیچشی
عنوان انگلیسی مقاله: Code authorship identification using convolutional neural networks
مجله/کنفرانس: سیستم های کامپیوتری نسل آینده - Future Generation Computer Systems
رشته های تحصیلی مرتبط: مهندسی کامپیوتر
گرایش های تحصیلی مرتبط: مهندسی نرم افزار، برنامه نویسی کامپیوتر، امنیت اطلاعات، هوش مصنوعی
کلمات کلیدی فارسی: شناسایی تألیف کد، ویژگیهای برنامه حریم خصوصی، شبکه عصبی پیچشی، شناسایی یادگیری عمیق، forensics نرم افزار و امنیت
کلمات کلیدی انگلیسی: Code authorship identification، Program features privacy، Convolutional neural network، Deep learning identification، Software forensics and security
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
نمایه: Scopus - Master Journals List - JCR
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.future.2018.12.038
دانشگاه: Computer Engineering Department, INHA University, Incheon, South Korea
صفحات مقاله انگلیسی: 12
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 7/007 در سال 2018
شاخص H_index: 93 در سال 2019
شاخص SJR: 0/835 در سال 2018
شناسه ISSN: 0167-739X
شاخص Quartile (چارک): Q1 در سال 2018
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
آیا این مقاله مدل مفهومی دارد: ندارد
آیا این مقاله پرسشنامه دارد: ندارد
آیا این مقاله متغیر دارد: ندارد
کد محصول: E11546
رفرنس: دارای رفرنس در داخل متن و انتهای مقاله
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Related work

3- Theoretical background

4- CNN-based code authorship identification systems

5- Experiment and evaluation

6- Limitations

7- Conclusion

References

بخشی از مقاله (انگلیسی)

Abstract

Although source code authorship identification creates a privacy threat for many open source contributors, it is an important topic for the forensics field and enables many successful forensic applications, including ghostwriting detection, copyright dispute settlements, and other code analysis applications. This work proposes a convolutional neural network (CNN) based code authorship identification system. Our proposed system exploits term frequency-inverse document frequency, word embedding modeling, and feature learning techniques for code representation. This representation is then fed into a CNN-based code authorship identification model to identify the code’s author. Evaluation results from using our approach on data from Google Code Jam demonstrate an identification accuracy of up to 99.4% with 150 candidate programmers, and 96.2% with 1,600 programmers. The evaluation of our approach also shows high accuracy for programmers identification over real-world code samples from 1987 public repositories on GitHub with 95% accuracy for 745 C programmers and 97% for the C++ programmers. These results indicate that the proposed approaches are not language-specific techniques and can identify programmers of different programming languages.

Introduction

Recently, the code authorship identification task has gained increased attention in the research community [1] due to its importance in software forensics. Code authorship identification is the process of identifying programmers based on their distinctive programming styles. Style is based on various factors, such as the programmer’s preferences in the way to write code, naming of the variables, programming proficiency and experience, and the thinking process to solve any programming task. All of these factors help to extract specific features from a given piece of a programmer’s code to enable the authorship identification process by assigning each piece to the programmer who wrote it. Thus, the advancements in this field could assist in several aspects of software forensics, such as software authorship disputes [2], code integrity investigations [3], code plagiarism detection [4], and copyright infringement [5]. Moreover, code authorship identification can be used to identify programmers of malicious code. The success of code authorship identification depends on effective features extraction process that captures the distinctive characteristics of programmers’ coding styles. This process is challenging, since the ‘‘coding style’’ of a programmer could change when working in environments or when following certain software engineering paradigms [6]. Being able to extract such features would enable accurate code authorship identification by assigning programmers to the input source code samples. This work investigates the capabilities of a convolutional neural network (CNN) to solve the code authorship identification problem.