Abstract
I. Introduction
II. Related Work
III. Approach
IV. Experiments
V. Conclusion
Authors
Figures
References
Abstract
Person re-identification (Re-ID), which is for matching pedestrians across disjoint camera views in surveillance, has made great progress in supervised learning. However, requirement of a large number of labelled identities leads to high cost for large-scale Re-ID systems. Consequently, it is significant to study learning Re-ID with unlabelled data and limited labelled data, that is, semi-supervised person re-identification. When labelled data is limited, the learned model tends to overfit the data and cannot generalize well. Moreover, the scene variations between cameras lead to domain shift in the feature space, which makes mining auxiliary supervision information from unlabelled data more difficult. To address these problems, we propose a Distilled Camera-Aware Self Training framework for semi-supervised person re-identification. To alleviate the overfitting problem for learning from limited labelled data, we propose a Multi-Teacher Selective Similarity Distillation Loss to selectively aggregate the knowledge of multiple weak teacher models trained with different subsets and distill a stronger student model. Then, we exploit the unlabelled data by learning pseudo labels by clustering based on the student model for self training. To alleviate the effect of scene variations between cameras, we propose a Camera-Aware Hierarchical Clustering (CAHC) algorithm to perform intra-camera clustering and cross-camera clustering hierarchically. Experiments show that our method outperformed the state-of-the-art semi-supervised person re-identification methods.
Introduction
Person re-identification (Re-ID) has received much attention in recent years due to its significance in video surveillance applications. When abundant labelled data is given, many works [1]–[7] have made great progress in supervised learning. However, labelling cost should be considered in largescale Re-ID system that consists of many cameras. To reduce labelling cost, studying semi-supervised learning to exploit unlabelled data and limited labelled data is a practical solution. Unsupervised person re-identification [8]–[15] has been studied to learn representation from unlabelled data, but how to effectively learn from limited labelled data is not considered in these methods. So far, semi-supervised person re-identification [16]–[20] is still under-explored. For semi-supervised Re-ID, exploiting unlabelled data and limited labelled data brings about some challenges. First, insufficient training data leads to overfitting for model learning and thus degrades generalization performance. Second, scene variations between cameras, such as illumination, background and viewpoint, cause domain shift in the feature space and create difficulty for mining auxiliary supervision information in unlabelled data to assist model training. The effect of scene variations is discussed in Section III-B later. To address the challenges for semi-supervised Re-ID, we propose a Distilled Camera-Aware Self Training framework, as shown in Figure 1.