Abstract
I. Introduction
II. Related Work
III. Method
IV. Experimental Design
V. Results
Authors
Figures
References
Abstract
We propose a novel sequence-to-sequence model for multi-label text classification, based on a ‘‘parallel encoding, serial decoding’’ strategy. The model combines a convolutional neural network and self-attention in parallel as the encoder to extract fine-grained local neighborhood information and global interaction information from the source text. We design a hierarchical decoder to decode and generate the label sequence. Our method not only gives full consideration to the interpretable fine-gained information in the source text but also effectively utilizes the information to generate the label sequence. We conducted a large number of comparative experiments on three datasets. The results show that the proposed model has significant advantages over the state-of-the-art baseline model. In addition, our analysis demonstrates that our model is competitive with the RNN-based Seq2Seq models and that it is more robust at handling datasets with a high label/sample ratio.
Introduction
Multi-label text classification [1], [2] is an important and challenging task in natural language processing (NLP), that is more complicated than single-label classification because labels often exhibit complex dependencies. A real life, typical example is that terms such as ‘‘politics’’, ‘‘economics’’, ‘‘culture’’ and other labels often appear on the front pages of news websites. The goal is to aid users in selecting the information they desire without being presented with irrelevant information. As a significant NLP task, many methods have been proposed and have gradually achieved satisfactory performances. Binary relevance (BR) [3] is one of the earliest methods; it models the task as consisting of multiple singlelabel classification problems by actively ignoring the label dependencies to achieve a certain level of performance. To capture the label dependencies, a classifier chain (CC) [4] is used to convert the task into a series of binary classification problems and model the dependencies. Conditional random fields (CRF) [5] and conditional Bernoulli mixtures (CBM) [6] have also been utilized to handle label dependencies. However, the above methods are applicable only for small or medium-scale datasets, which makes them difficult to apply to large-scale datasets. With the development of neural networks, some neural models have been applied to solve this task that have achieved improvements.