Abstract
1- Introduction
2- Related work
3- Proposed method
4- Experiments
5- Conclusion
References
Abstract
Face detection constitutes a key visual information analysis task in Machine Learning. The rise of Big Data has resulted in the accumulation of a massive volume of visual data which requires proper and fast analysis. Deep Learning methods are powerful approaches towards this task as training with large amounts of data exhibiting high variability has been shown to significantly enhance their effectiveness, but often requires expensive computations and leads to models of high complexity. When the objective is to analyze visual content in massive datasets, the complexity of the model becomes crucial to the success of the model. In this paper, a lightweight deep Convolutional Neural Network (CNN) is introduced for the purpose of face detection, designed with a view to minimize training and testing time, and outperforms previously published deep convolutional networks in this task, in terms of both effectiveness and efficiency. To train this lightweight deep network without compromising its efficiency, a new training method of progressive positive and hard negative sample mining is introduced and shown to drastically improve training speed and accuracy. Additionally, a separate deep network was trained to detect individual facial features and a model that combines the outputs of the two networks was created and evaluated. Both methods are capable of detecting faces under severe occlusion and unconstrained pose variation and meet the difficulties of large scale real-world, real-time face detection, and are suitable for deployment even in mobile environments such as Unmanned Aerial Vehicles (UAVs).
Introduction
The spread of social media and the rise of the Internet of Things have significantly boosted the amount of data readily available in all aspects of human life and led us into the era of Big Data. This vast volume of data has the potential to enhance our understanding of the state of the world and to allow for more accurate predictions of a future state [1]. Big Data has already been exploited in many fields for this reason, including biometrics systems [2, 3, 4], where face detection poses a critical auxiliary role towards improved face recognition, as facial features capture a large part of the individuality of a person. In fact, face detection has been an active research area in the computer vision field for more than three decades, mainly due to the countless number of applications that require face detection as a first step [5, 6, 7, 8]. Nowadays, commercial and professional robotic units (e.g., drones) and mobile devices such as smartphones and tablets provide users with various facebased applications related to the task of face detection, such as intelligent video shooting, privacy preserving navigation and control, security-enhancing applications, automatic annotation of visual content and affective computing, including face and facial expression recognition and tracking [9]. In Unmanned Aerial Vehicles (UAVs) especially, face detection may serve as a tool to help guide the on-board camera towards faces of people of interest. As an example, in sports events, face detection may be the first step towards recognizing important athletes, such as bicyclists in professional cycling events. However, the limited availability of powerful Graphical Processing Units (GPU) on such devices asserts a limitation to the performance of the algorithms that can be used efficiently. The recently released mobile on-board GPUs for drones are approximately ten times slower than desktop ones, with only a fraction of RAM and this constraint renders most of the published deep learning algorithms inadequate for such applications. The challenges of face detection and recognition using drones have been studied in [10]. Many non neural network methods have been proposed and deployed in various commercial products like digital cameras or smartphones in the last decade. The influential work of Viola and Jones [11] made it possible to detect faces in real-time but with a limited efficiency and later on inspired many cascade-based methods. Since then, research in face detection has made remarkable progress as a result of the availability of data in unconstrained capture conditions, the development of publicly available benchmarks and the fast growth in computational and processing power of modern computers. The introduction of features extraction methodologies such as Histograms of Oriented Gradients (HoGs) [12], Speeded Up Robust Features (SURF) [13], and Integral Channel Features (ICF) [14] was also of great benefit to face detection algorithms. In another approach, a mixture of trees was utilized for unified face detection, pose estimation and landmark localization [15].