Abstract
1- Introduction
2- Proposed framework
3- Experimental evaluation
4- Conclusion and future work
Acknowledgment
References
Abstract
Action recognition is a challenging research area in which several convolutional neural networks (CNN) based action recognition methods are recently presented. However, such methods are inefficient for real-time online data stream processing with satisfied accuracy. Therefore, in this paper we propose an efficient and optimized CNN based system to process data streams in real-time, acquired from visual sensor of non-stationary surveillance environment. Firstly, frame level deep features are extracted using a pre-trained CNN model. Next, an optimized deep autoencoder (DAE) is introduced to learn temporal changes of the actions in the surveillance stream. Furthermore, a non-linear learning approach, quadratic SVM is trained for the classification of human actions. Finally, an iterative fine-tuning process is added in the testing phase that can update the parameters of trained model using the newly accumulated data of non-stationary environment. Experiments are conducted on benchmark datasets and results reveal the better performance of our system in terms of accuracy and running time compared to state-of-the-art methods. We believe that our proposed system is a suitable candidate for action recognition in surveillance data stream of non-stationary environments.
Introduction
Human action recognition encompasses many important domains of real-life such as intelligent videos surveillance, detection of abnormal and suspicious actions, video retrieval based on different actions, video semantic recognition, and patients monitoring in healthcare centers [1-3]. There are numerous applications of action recognition using online data stream such as monitoring through visual sensors in surveillance, videos from websites, and social media feeds, that can lead to detect initiated anomaly, fraud or any abnormal situations [4]. In the context of videos, human actions can be recognized by the movement of different body parts such as hands and legs. A single still image cannot convey the whole idea of an action [5]. For example, jumping for a head-shot in football and skipping rope have the same action pose in the initial frame. The discrimination of both actions can be captured in a sequence of frames. Analyzing the movements of a human body in frame sequence and interaction with surrounding leads to recognizing the perfect actions in the video data stream [6, 7]. In non-stationary data streams whenever variation in new data is encountered, the trained model over the previous data cannot be considered effective enough. The reason is its adaptability issue over the new distribution of data which needs diversity for non-stationary environment [8]. To overcome this issue, Lobo et al. [9] considered it as an optimization problem, which is solved by a bio-inspired algorithm to validate the heterogeneity of drifts and achieved high diversity through self-learning optimization technique. Another novel approach is proposed by Krawczyk et al. [10] modified weighted one-class SVM and improved it for the non-stationary streaming data analysis. They claimed that one-class classifier can adapt its decision boundary according to new data streams along with forgetting mechanism which helps the model to re-learn the parameters. Similarly, Bartosz et al. [11] presented an efficient ensemble learning technique for recognizing activities in real-time. The system iteratively modifies the weights of Naïve Bayes classifier and make it smoothly adaptable to current situation of stream even without an external drift detector. Abdallah et al. [12] presented a detailed survey about activity recognition in online data stream mining. Moreover, recognizing human actions accurately in real-time from online surveillance data stream is a highly challenging task due to computation of high-dimensional features, variation in viewpoint, motion, cluttered backgrounds, occlusion, and different illumination conditions [13-15].