Violence action recognition using region proposal in region convolution neural network

Abstract

Having an autonomous system to alarm for violence or suspicious incidence could greatly strengthen the security system. Such autonomous system could also be useful for other application such as patient monitoring, retail shop, and children surveillance. However, the current technology has not yet reach the level to effectively analyze the video since currently most video surveillance system could not understand the events happen in the video. Complex changes in environment caused by camera motion, dynamic scene such as crowds, changes in lighting intensity, viewing from different angles, wide variation in spatial (e.g. size of interest subject relative to video) and temporal (speed of the subjects in performing actions) make video analysis task a very challenging task. Even with these difficulties, researches in improving video analysis methods are still being actively explored. Some research approaches in violence incidence detection resembling the method used in detecting abnormal incidence. Instead of detecting whether an incidence have occurred, we attempt to build a model to detect the actions related to violence. In this paper, an online detection model is built to detect specific action related to violence actions. The model is built with reference of the image object detection (Faster-Region Convolution Neural Network, Faster-RCNN) and video action detection (Tube-Convolution Neural Network, TCNN).

Keywords

Online model video action detection

Get full access to this article

View all access options for this article.

References

Tran

Bourdev

Fergus

Torresani

Paluri

. Learning spatiotemporal features with 3d convolutional networks. Proceedings of The IEEE International Conference on Computer Vision 2015.

Wang

Schmid

. Action recognition with improved trajectories. Proceedings of The IEEE International Conference on Computer Vision 2013.

Feichtenhofer

Pinz

Zisserman

. Convolutional two-stream network fusion for video action recognition. Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition 2016.

Gemert

Jain

Gati

Snoek

. APT: Action localization proposals from dense trajectories. BMVC 2015.

Hou

Chen

Shah

. Tube convolutional neural network (T-CNN) for action detection in videos. IEEE International Conference on Computer Vision 2017.

Soomro

Idrees

Shah

. Online localization and prediction of actions and interactions. IEEE Trans Pattern Anal Mach Intell 2018.

Christiansen

Nielsen

Steen

Jørgensen

Karstoft

. DeepAnomaly: Combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field. Sensors 2016; 16(11): 1904.

Sabokrou

Fayyaz

Fathy

Moayed

Klette

. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Und 2018.

Mazaheri

Zhang

Shah

. Video fill in the blank using lr/rl lstms with spatial-temporal attentions. 2017 IEEE International Conference on Computer Vision (ICCV) 2017.

10.

Ren

Girshick

Sun

. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 2015.

11.

Redmon

Farhadi

. YOLO9000: better, faster, stronger. arXiv preprint 2017.

12.

Laptev

Marszalek

Schmid

Rozenfeld

. Learning realistic human actions from movies. IEEE Conference on Computer Vision and Pattern Recognition 2008.

13.

Wang

Kläser

Schmid

Liu

C-L

. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 2013; 103(1): 60-79.

14.

Wang

Qiao

Tang

. Motionlets: Mid-level 3d parts for human motion recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2013.

15.

Sadanand

Corso

. Action bank: A high-level representation of activity in video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012.

16.

Liu

Nie

Kankanhalli

. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 2017; 39(1): 102-14.

17.

Wang

Qiao

Tang

. Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015.

18.

Cheng

K-W

Chen

Y-T

Fang

W-H

. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015.

19.

Lee

Nevatia

. Hierarchical abnormal event detection by real time and semi-real time multi-tasking video surveillance system. Mach Vision Appl 2014; 25(1): 133-43.

20.

Tripathi

Mittal

Gangodkar

Kanth

. Real time security framework for detecting abnormal events at ATM installations. Journal of Real-Time Image Processing 2016; 1-11.

21.

Nievas

Suarez

García

Sukthankar

. Violence detection in video using computer vision techniques. International Conference on Computer Analysis of Images and Patterns 2011; Springer.

22.

Fernando

Gavves

Oramas

Ghodrati

Tuytelaars

. Modeling video evolution for action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015.

23.

Simonyan

Zisserman

. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 14091556; 2014.

24.

Wang

Schmid

, editors. Action recognition with improved trajectories. Proceedings of the IEEE International Conference on Computer Vision 2013.

25.

Peng

Zou

Qiao

Peng

. Action recognition with stacked fisher vectors. European Conference on Computer Vision 2014; Springer.

26.

Zhao

Liu

Han

Hong

Tian

. Pooling the convolutional layers in deep convnets for video action recognition. IEEE Transactions on Circuits and Systems for Video Technology 2018; 28(8): 1839-49.

27.

Wang

Liu

Shen

. Order-aware convolutional pooling for video based action recognition. arXiv preprint 2016.

28.

Karpathy

Toderici

Shetty

Leung

Sukthankar

Fei-Fei

. Large-scale video classification with convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014.

29.

Wang

Fang

. Three-stream CNNs for action recognition. Pattern Recognition Letters 2017; 92: 33-40.

30.

Krizhevsky

Sutskever

Hinton

. Imagenet classification with deep convolutional neural networks. Advances in neural Information Processing Systems 2012.

31.

Kalogeiton

Weinzaepfel

Ferrari

Schmid

. Action tubelet detector for spatio-temporal action localization. IEEE International Conference on Computer Vision (ICCV) 2017.

32.

Wei

S-E

Ramakrishna

Kanade

Sheikh

. Convolutional pose machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016.

33.

Girshick

. Fast R-CNN object detection with Caffe. Microsoft Research 2015.

34.

Sultani

Zhang

Shah

. Unsupervised action proposal ranking through proposal recombination. Comput Vis Image Und 2017; 161: 42-50.

35.

Cho

Kwak

Schmid

Ponce

. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015.

36.

Reddy

Shah

. Recognizing 50 human action categories of web videos. Mach Vision Appl 2013; 24(5): 971-81.

37.

“CAVIAR Test Case Scenarios”, 2003. [Online]. Available: http://homepagesinf.ed.ac.uk/rbf/CAVIARDATA1/. [Accessed: 28-Jul-2017].

38.

Kuehne

Jhuang

Stiefelhagen

Serre

. Hmdb51: A large video database for human motion recognition. High Performance Computing in Science and Engineering ’12: Springer; 2013. p. 571-82.

39.

Soomro

Zamir

Shah

. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv: 12120402; 2012.

40.

Abu-El-Haija

Kothari

Lee

Natsev

Toderici

Varadarajan

, et al. Youtube-8m: A large-scale video classification benchmark. arXiv preprint 2016.

41.

Sjöberg

Ionescu

Jiang

Y-G

Quang

Schedl

Demarty

C-H

. The MediaEval 2014 Affect Task: Violent Scenes Detection. MediaEval 2014.

42.

Szegedy

Liu

Jia

Sermanet

Reed

Anguelov

, et al. Going deeper with convolutions. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 2015.

43.

Girshick

. Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision 2015; 1440-1448.

44.

Everingham

Van Gool

Williams

Winn

Zisserman

. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 2010; 88(2): 303-38.