Abstract
1. Introduction
2. Related work
3. Difficulty-weighted learning
4. Evaluation
5. Results and discussion
6. Summary and future work
CRediT authorship contribution statement
Conflict of interest
Acknowledgement
References
Abstract
Curriculum learning, in which training examples gradually proceed from easy to difficulty, has been applied to various tasks and demonstrated better performance than other machine learning approaches. However, identifying the difficulty level in advance often requires domain knowledge and is a time-consuming process. We dynamically decide the difficulty of examples based on outputs from neural networks during training and propose a loss function to promote training with difficult examples. Experimental results verify that the proposed method improves the generalization ability across several datasets.
Introduction
Neural networks have been demonstrating excellent classification performance for various datasets of images, audio, language, among others. This performance has relied on the development of robust training methods such as fine-tuning (Hinton and Salakhutdinov, 2006; Mesnil et al., 2012; Yosinski et al., 2014) and generative adversarial networks (Goodfellow et al., 2014; Radford, Metz, and Chintala, 2015). Curriculum learning, proposed by Bengio et al. (2009), is another powerful training method, in which learning gradually proceeds from easy to difficult examples, aiming to resemble human learning. Its proponents successfully applied curriculum learning to classification of geometric shapes and language processing. In this paper, we prioritize the classification of difficult examples over easy examples. Therefore, we focus on the training of difficult examples and employ the conventional curriculum learning (Bengio et al., 2009) to train easy examples. A training strategy based on difficulty can be easily implemented in neural networks, because the classification outputs represent the degree of confidence, that is, the difficulty of the examples. To increase the weight of difficult examples over easy ones, we use a loss function weighted by the network outputs. As the loss function is determined at each iteration, it can reflect the varying difficulty of examples, establishing the proposed method, which we call difficulty-weighted learning (DWL). DWL is strongly related to expert systems because it automatically retrieves the difficulty level of examples based on the devised loss function, whereas conventional methods, such as curriculum learning (Bengio et al., 2009), require domain knowledge for each task. Furthermore, as DWL is supported by neural networks, which are powerful intelligent systems, the DWL implementation can be regarded as an expert and intelligent system.