طبقه بندی متن با مقیاس بزرگ
ترجمه نشده

طبقه بندی متن با مقیاس بزرگ

عنوان فارسی مقاله: طبقه بندی متن با مقیاس بزرگ با استفاده از شبکه عصبی پیچشی مبتنی بر دامنه: یک رویکرد یادگیری عمیق
عنوان انگلیسی مقاله: Large-Scale Text Classification Using Scope-Based Convolutional Neural Network: A Deep Learning Approach
مجله/کنفرانس: دسترسی – IEEE Access
رشته های تحصیلی مرتبط: مهندسی کامپیوتر، مهندسی فناوری اطلاعات
گرایش های تحصیلی مرتبط: مهندسی الگوریتم و محاسبات، هوش مصنوعی، شبکه های کامپیوتری
کلمات کلیدی فارسی: طبقه بندی متن، یادگیری عمیق، شبکه عصبی پیچشی، پیچش مبتنی بر دامنه، ویژگی محلی
کلمات کلیدی انگلیسی: Text classification, deep learning, convolutional neural network, scope-based convolution, local feature
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1109/ACCESS.2019.2955924
دانشگاه: Big Data Management and Analysis Laboratory of Urban Construction, Shenyang Jianzhu University, Shenyang 110168, China
صفحات مقاله انگلیسی: 11
ناشر: آی تریپل ای - IEEE
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 4.641 در سال 2018
شاخص H_index: 56 در سال 2019
شاخص SJR: 0.609 در سال 2018
شناسه ISSN: 2169-3536
شاخص Quartile (چارک): Q2 در سال 2018
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
آیا این مقاله مدل مفهومی دارد: ندارد
آیا این مقاله پرسشنامه دارد: ندارد
آیا این مقاله متغیر دارد: ندارد
کد محصول: E14060
رفرنس: دارای رفرنس در داخل متن و انتهای مقاله
فهرست مطالب (انگلیسی)

Abstract

I. Introduction

II. Related Work

III. Problem Definition and Preliminaries

IV. Scope-Based Convolutional Neural Network

V. Experiments

Authors

Figures

References

بخشی از مقاله (انگلیسی)

Abstract

Text classification is one of the most important and typical tasks in Natural Language Processing (NLP) which can be applied for many applications. Recently, deep learning approaches has shown their advantages in solving text classification problem, in which Convolutional Neural Network (CNN) is one of the most successful model in the field. In this paper, we propose a novel deep learning approach for categorizing text documents by using scope-based convolutional neural network. Different from windowbased CNN, scope does not require the words that construct a local feature have to be contiguous. It can represent deeper local information of text data. We propose a large-scale scope-based convolutional neural network (LSS-CNN), which is based on scope convolution, aggregation optimization, and max pooling operation. Based on these techniques, we can gradually extract the most valuable local information of the text document. This paper also discusses how to effectively calculate the scope-based information and parallel training for large-scale datasets. Extensive experiments have been conducted on real datasets to compare our model with several state-of-the-art approaches. The experimental results show that LSS-CNN can achieve both effectiveness and good scalability on big text data.

Introduction

The task of text classification (a.k.a. text tagging, text filtering or text categorization) is a process of categorizing a text document into one or multiple predefined categories based on the content. Concretely, the target is to build a classifier which takes a text document as an input, then automatically assigns relevant labels according its content. These text documents can be emails, comments, or movie reviews. Accordingly the labels can be spam/non-spam, positive/negative/neutral or review scores. Text classification plays an important role in Natural Language Processing (NLP). It is widely adopted in many applications. For example, most of news services today needs to automatically organize a large volume of new articles every single day [1]. All the modern mail services provide a function to determine either a mail is a junk mail automatically [2]. Other applications include sentiment analysis [3], topic modelling [4], language translation [5], and intent detection [6], etc. Text classification is a challenge problem. The sparse, high dimensional, and existence of irrelevant or noisy characteristics of text make it a non-trivial job to develop a good classifier for large-scale text data. Because of its importance and challenge, a lot of methods range from traditional feature engineering classification methods [7]–[10] with hand-crafted features to emerging deep learning methods [1], [11]–[15] have been proposed to solve the problem.