Abstract
1- Introduction
2- From softmax loss to minimum margin loss
3- Experiments
4- Conclusion
References
Abstract
Face recognition has achieved great success owing to the fast development of deep neural networks in the past few years. Different loss functions can be used in a deep neural network resulting in different performance. Most recently some loss functions have been proposed, which have advanced the state of the art. However, they cannot solve the problem of margin bias which is present in class imbalanced datasets, having the so-called long-tailed distributions. In this paper, we propose to solve the margin bias problem by setting a minimum margin for all pairs of classes. We present a new loss function, Minimum Margin Loss (MML), which is aimed at enlarging the margin of those overclose class centre pairs so as to enhance the discriminative ability of the deep features. MML, together with Softmax Loss and Centre Loss, supervises the training process to balance the margins of all classes irrespective of their class distributions. We implemented MML in Inception-ResNet-v1 and conducted extensive experiments on seven face recognition benchmark datasets, MegaFace, FaceScrub, LFW, SLLFW, YTF, IJB-B and IJB-C. Experimental results show that the proposed MML loss function has led to new state of the art in face recognition, reducing the negative effect of margin bias.
Introduction
In the past ten years, deep neural network (DNN) based methods have achieved great progress in various computer vision tasks, including face recognition [1], person re-identification [2], object detection [3] and action recognition [4]. The progress on face recognition is particularly remarkable due largely to two important factors – larger face datasets and better loss functions. The quantity and quality of the face datasets used for training directly influence the performance of a DNN model in face recognition. Currently, there are a few large-scale face datasets that are publicly available, for example, MS-Celeb-1M [5], VGGFace2 [6], MegaFace [7] and CASIA WebFace [8]. As shown in Table 1, CASIA WebFace consists of 0.5M face images; VGGFace2 contains totally 3M face images but only from 9K identities; MS-Celeb-1M and MegaFace both contain more images and more identities, thus should have greater potential for training a better DNN model. However, both MS-Celeb-1M and MegaFace have the problem of long-tailed distribution [9], which means a minority of people owns a majority of face images and a large number of people have very limited face images. Using datasets with long-tailed distribution, the trained model tends to overfit the classes with rich samples thus weakening the generalisation ability on the longtailed portion [9]. Specifically, the classes with rich samples tend to have a relatively large margin between their class centres; conversely, the classes with limited samples tend to have a relatively small margin between their class centres as they only occupy a small region in space and are thus easy to be compressed. This margin bias problem is due to long-tailed class distribution, which leads to performance drop on face recognition [9].