Abstract
1- Introduction
2- Related work
3- Preliminary
4- Clustering based domain adaptation
5- Experiments
6- Conclusion
References
Abstract
Despite great progress in face recognition tasks achieved by deep convolution neural networks (CNNs), these models often face challenges in real world tasks where training images gathered from Internet are different from test images because of different lighting condition, pose and image quality. These factors increase domain discrepancy between training (source domain) and testing (target domain) database and make the learnt models degenerate in application. Meanwhile, due to lack of labeled target data, directly fine-tuning the pre-learnt models becomes intractable and impractical. In this paper, we propose a new clustering-based domain adaptation method designed for face recognition task in which the source and target domain do not share any classes. Our method effectively learns the discriminative target feature by aligning the feature domain globally, and, at the meantime, distinguishing the target clusters locally. Specifically, it first learns a more reliable representation for clustering by minimizing global domain discrepancy to reduce domain gaps, and then applies simplified spectral clustering method to generate pseudo-labels in the domain-invariant feature space, and finally learns discriminative target representation. Comprehensive experiments on widely-used GBU, IJB-A/B/C and RFW databases clearly demonstrate the effectiveness of our newly proposed approach. State-of-the-art performance of GBU data set is achieved by only unsupervised adaptation from the target training data.
Introduction
Benefiting from convolutional neural networks (CNNs) [1, 2, 3, 4, 5], deep face recognition (FR) has been the most efficient biometric technique for identity authentication and has been widely used in enormous areas such as military, finance, public security as well as our daily life. However, deep networks which perform perfectly on benchmark datasets may fail badly on real world applications. This is because the set of real world images is infinitely large and so it is hard for any dataset, no matter how big, to be representative of the complexity of the real world. One persuasive evidence is presented by P.J. Phillips’ study [6] which conducted a cross benchmark assessment of VGG model [7] for face recognition. The VGG model, trained on over 2.6 million face images of celebrities from the Web, is a typical FR systems and achieves 98.95% on LFW [8] and 97.30% on YTF [9]. However, It only obtains 26%, 52% and 85% on Ugly, Bad and Good partition of GBU database, even if all of images in GBU are nominally frontal. The main reason is a different distribution between training data (source domain) and testing data (target domain), referred to as domain or covariate shift. Visual examples of this domain shift are shown in Fig. 1. Each dataset in Fig. 1 displays a unique “signature” and thus one can easily distinguish them only by these signatures, which proves the existence of significant discrepancies. The images in CASIA-WebFace [10] are collected from Internet under unconstrained environment and most of the figures are celebrities and public taken in ambient lighting; The GBU [11] contains still frontal facial images and is taken outdoors or indoors in atriums and hallways with digital camera; IJB-A [12] covers large pose variations and contains many blurry video frames. Sometimes, the images of GBU and IJB-A datasets may be closer to the ones in real life which are taken with digital camera under different shooting environments and contain larger variations.