Abstract
I. Introduction
II. Relate Work
III. Framework of This Proposed Method
IV. Extraction of HOGV and Color Histogram Descriptor
V. ROLS Based Classifier
Authors
Figures
References
Abstract
Visual object tracking in unconstrained environments is a challenging task in computer vision. How to design an efficient discriminative feature representation is one challenging issue. To improve the adaptability of the tracker to large object appearance changes, the observation model needs to be updated online. However, a bad model update using inaccurate training samples can lead to model drift problem. Therefore, how to design an efficient online observation model and a model update strategy are two other challenging issues. This paper proposes the concatenation of histogram of oriented gradients variant (HOGv) and color histogram as the feature representation to balance discriminative power and efficiency. The single-hidden-layer feedforward neural network (SFNN) is used as an observation model, and the recursive orthogonal least squares (ROLS) algorithm is used to update the model online. A bidirectional tracking scheme is designed to alleviate the model drift problem during online tracking. The proposed bidirectional tracking scheme consists of three modules: the forward tracking module, the backward tracking module and the integration module. The forward tracking module first finds all the candidate regions, and then, the backward tracking module calculates the respective confidence of each candidate region according to historical information. Finally, the integration module integrates both of the first two modules’ results to determine the final tracked object and the model update strategy for the current frame. Extensive evaluations of the existing tracking benchmarks have shown that the proposed tracking framework results in significant performance improvements compared with the base tracker, and it outperforms most of the state-of-the-art trackers.
Introduction
Visual object tracking, which is used to estimate the trajectory of a target specified in the initial frame, is a fundamental topic in computer vision [1], [2]. Visual object tracking has numerous applications, such as intelligent video surveillance, intelligent transportation, human-computer interactions and so on. Despite significant progress in recent decades, visual object tracking is still a challenging problem due to irregular changes in appearance that are caused by partial or full occlusion, cluttered backgrounds, fast motion, deformation and illumination changes. Feature representation is one of the important factors for visual object tracking. Numerous hand-crafted features have been utilized for visual object tracking, such as color name [3], histograms of oriented gradient (HOG) [4], local binary pattern (LBP) [5] and so on. These hand-crafted features have relatively high computational efficiency but have been demonstrated to be less effective on the complex scene. Recently, convolutional neural networks (CNNs), with strong capabilities to learn feature representations, have demonstrated state-of-the-art performance in various computer vision tasks [6]–[8]. However, in visual object tracking, it is difficult to straightforwardly adopt CNNs, since they require a large number of training samples, and there is only one labeled positive sample that is extracted from the initial frames. One possible way is to utilize CNNs that have been trained on other tasks with a large-scale training dataset.