Abstract
1-Introduction
2-Elated Work
3-Method
4-Experimental Results
5-Conclusion
6-Acknowledgements
References
Abstract
This paper proposes, develops and evaluates a novel object-tracking algorithm that outperforms start-of-the-art method in terms of its robustness. The proposed method compromises Siamese networks, Recurrent Convolutional Neural Networks (RCNNs) and Long Short Term Memory (LSTM) and performs short-term target tracking in real-time. As Siamese networks only generates the current frame tracking target based on the previous frame of image information, it is less effective in handling target’s appearance and disappearance, rapid movement, or deformation. Hence, our method a novel tracking method that integrates improved full-convolutional Siamese networks based on all-CNN, RCNN and LSTM. In order to improve the training efficiency of the deep learning network, a strategy of segmented training based on transfer learning is proposed. For some test video sequences that background clutters, deformation, motion blur, fast motion and out of view, our method achieves the best tracking performance. Using 41 videos from the Object Tracking Benchmark (OTB) dataset and considering the area under the curve for the precision and success rate, our method outperforms the second best by 18.5% and 14.9% respectively.
Introduction
In computer vision, because partial occlusions, target deformations, motion blur, background clutters, and object deformation, scale changes and illumination variations, visual tracking is a challenging task [1]. Some researchers improve the robustness of their tracking algorithm by applying machine learning [2], such as TLD (TrackingLearning-Detection) [3] and KCF (kernelized correlation filters) [4]. Because CNNs (convolutional neural networks) have strong capabilities of learning feature representations, other researchers to apply CNNs to address challenges faced by tracking using traditional machine learning[5]. However, approaches that are solely based on CNNs have the following problems: because of no effective use of temporal continuity, these algorithms cannot well handle obstructed target scenarios; these algorithms that are based on deep-learning models in general have a huge number of parameters, to train such a huge model, i.e. lots of data is required, however, In contrast, in the case of target tracking, much fever training samples are available, hence render performance.