Abstract
I. Introduction
II. Related Work
III. Proposed Method
IV. Experiments
V. Conclusion
Authors
Figures
References
Abstract
Cross-modality person re-identification between the visible domain and infrared domain is important but extremely challenging for night-time surveillance. Besides the cross-modality discrepancies caused by different camera spectrums, visible infrared person re-identification (VI-REID) still suffers from much pedestrian misalignment as well as the variations caused by different camera viewpoints and various pedestrian pose deformations like traditional person re-identification. In this paper, we propose a multipath adaptive pedestrian alignment network (MAPAN) to learn discriminative feature representations. The multi-path network learns features directly from the data in an end-to-end manner and aligns the pedestrians adaptively without any additional manual annotations. To alleviate the intra-modality discrepancies caused by image misalignment, we combine the aligned visible image features with the original visible image features and enhance the attention of the network towards pedestrians, resulting in significant improvements in distinguishability of the learning features. To mitigate the cross-modality discrepancies between the visible domain and the infrared domain, the discriminative features of the two modalities are mapped to the same feature embedding space, and the identity loss as well as triplet loss is incorporated as the overall loss. Extensive experiments demonstrate the superior performance of proposed method compared to the state-ofthe-arts.
Introduction
Person Re-identification (known as ReID) is a technique in the field of computer vision to identify a specific pedestrian as (numerically) the same particularly as one encountered on a previous occasion [1]. It is generally considered to be a sub-problem of image retrieval and has a bright application prospect in the field of intelligent monitoring. But there are great challenges for ReID such as low resolution of the camera and various pedestrian pose deformations. Pedestrian images captured by different cameras may also cause enormous discrepancies in the appearance of pedestrians due to occlusion, various viewpoints, illumination variations, etc. Despite the difficulties, traditional person re-identification has made great progress in recent years with people’s unremitting efforts, including many supervised methods [2]–[11], as well as unsupervised or weakly supervised methods [12]–[19]. Most of the existing methods are mainly based on feature representation learning [2], [5], [7], [20] or metric learning [3], [6], [10], [16]. Recently, the newly proposed methods tend to work on body part-based features and semantic information [8], [11], [17], [23], or attention mechanisms [9], [21], [22] to achieve higher recognition accuracy. However, all the traditional person re-identification methods mentioned above only use visible images to match visible images whereas the visible camera can not capture clear images under poor illumination environments. Fortunately, with the development of society, most of the cameras are equipped with infrared camera function today.