In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians).
The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity.
To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on the IJB-A and IJB-B face recognition benchmarks, exceeding the previous state-of-the-art by a large margin. The dataset and models are publicly available.
Concurrent with the rapid development of deep Convolutional Neural Networks (CNNs), there has been much recent effort in collecting large scale datasets to feed these datahungry models. In general, recent datasets (see Table I) have explored the importance of intra- and inter-class variation. The former focuses on depth (many images of one subject) and the latter on breadth (many subjects with limited images per subject). However, none of these datasets was specifically designed to explore pose and age variation. We address that here by designing a dataset generation pipeline to explicitly collect images with a wide range of pose, age, illumination and ethnicity variations of human faces.
We make the following four contributions: first, we have collected a new large scale dataset, VGGFace2, for public release. It includes over nine thousand identities with between 80 and 800 images for each identity, and more than 3M images in total; second, a dataset generation pipeline is proposed that encourages pose and age diversity for each subject, and also involves multiple stages of automatic and manual filtering in order to minimise label noise; third, we provide template annotation for the test set to explicitly explore pose and age recognition performance; and, finally, we show that training deep CNNs on the new dataset substantially exceeds the state-of-the-art performance on the IJB benchmark datasets , . In particular, we experiment with the recent Squeeze and Excitation network , and also investigate the benefits of first pre-training on a dataset with breadth (MS-Celeb-1M ) and then fine tuning on VGGFace2.
In this work, we have proposed a pipeline for collecting a high-quality dataset, VGGFace2, with a wide range of pose and age. Furthermore, we demonstrate that deep models (ResNet-50 and SENet) trained on VGGFace2, achieve stateof-the-art performance on the IJB-A and IJB-B benchmarks. The dataset and models are available at https://www.robots. ox.ac.uk/∼vgg/data/vgg face2/.