Light weight convolutional models with spiking neural network based human action recognition

Abstract

Though deep learning networks have proven ability to perform video analytics in complex environments, there is an increased attention towards the development of compact networks which would facilitate edge processing and the result of which have yielded high performance compressed deep learning networks such as, MobileNet, PWCNet and BindsNet. In the work proposed herein, a dual network configuration is used for human action recognition, wherein, the MobileNet captures the spatial appearance of the action sequences and the PWCNet is used to extract the motion vectors. A novel Spiking Neural Network (SNN) based configuration is used as the classifier and the SNN implementation is based on BindsNet. The proposed configuration is experimentally validated on challenging datasets, viz., HMDB51 and UCF101. The experimental results demonstrate that the proposed work is superior to the state-of-the-art techniques and comparable in few cases.

Keywords

MobileNet PWCNet BindsNet diehl and cook nodes spiking neural network

Get full access to this article

View all access options for this article.

References

Laptev

, On space-time interest points, Int J Comput Vis 64(2-3) (2005), 107–123.

Berlin

S.J.

, John

, Human interaction recognition through deep learning network, In Proceedings of the IEEE international Carnahan conference on Security Technology (ICCST), Orlando, FL, USA, (2016), pp. 1–4.

Aslan

M.F.

, Durdu

, Sabanci

, Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization, Neural Comput Appl (2019), 1–13.

Scovanner

, Ali

, Shah

, A 3-dimensional sift descriptor and its application to action recognition, In Proceedings of the 15th ACM international conference on Multimedia, ACM, Augsburg, Bavaria, Germany (2007), 357–360.

Klaser

, Marszałek

, Schmid

, A spatio-temporal descriptor based on 3d-gradients, In Proceedings of the 19th British Machine Vision Conference (BMVC2008), Leeds, UK, (2008), pp. 1–10.

El-Ghaish

, Hussein

M.E.

, Shoukry

, Onai

, Human Action Recognition Based on Integrating Body Pose, Part Shape, and Motion, IEEE Access 6 (2018), 49040–49055.

Wang

, Schmid

, Action recognition with improved trajectories, In Proceedings of the IEEE International Conference on Computer Vision (2013), 3551–3558.

F.P.

, Human Action Recognition Algorithm Based on Adaptive Initialization of Deep Learning Model Parameters and Support Vector Machine, IEEE Access 6 (2018), 59405–59421.

Bouwmans

, Garcia-Garcia

, Background Subtraction in Real Applications: Challenges, Current Models and Future Directions, arXiv preprint arXiv:1901.03577 (2019).

10.

Ellis

, Masood

S.Z.

, Tappen

M.F.

, LaViola

J.J.

, Sukthankar

, Exploring the trade-off between accuracy and observational latency in action recognition, Int J Comput Vis 101(3) (2013), 420–436.

11.

Liu

, Wu

, Tang

, Shi

, Gaze-assisted multi-stream deep neural network for action recognition, IEEE Access 5 (2017), 19432–19441.

12.

Gao

, Xiang

, Xiong

, Huang

, Lee

H.J.

, Alrifai

, Jiang

, Fang

, Human action monitoring for healthcare based on deep learning, IEEE Access 6 (2018), 52277–52285.

13.

Ladjailia

, Bouchrika

, Merouani

H.F.

, Harrati

, Mahfouf

, Human activity recognition via optical flow: decomposing activities into basic actions, Neural Comput Appl (2019), 1–14.

14.

Tran

, Bourdev

, Fergus

, Torresani

, Paluri

, Learning spatiotemporal features with 3d convolutional networks, In Proceedings of the IEEE international conference on Computer Vision (ICCV), Santiago, Chile, (2015), pp. 4489–4497.

15.

, Xu

, Yang

, Yu

, 3D convolutional neural networks for human action recognition, IEEE Trans Pattern Anal Mach Intell 35(1) (2013), 221–231.

16.

Yao

, Lei

, Zhong

, Jiang

, Learning multi-temporal-scale deep information for action recognition, Appl Intell 49(6) (2019), 2017–2029.

17.

Zhang

, Wu

, Luo

, Human activity recognition with HMM-DNN model, In Proceedings of the IEEE international conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), Beijing, China, (2015), 192–197.

18.

Ullah

, Muhammad

, Del Ser

, Baik

S.W.

and Albuquerque

, Activity recognition using temporal optical flow convolutional features and multi-layer LSTM, IEEE Trans Ind Electron (2018).

19.

Arifoglu

, Bouchachia

, Activity recognition and abnormal behaviour detection with recurrent neural networks, Procedia Comput Sci 110 (2017), 86–93.

20.

Szegedy

, Liu

, Jia

, Sermanet

, Reed

, Anguelov

, Erhan

, Vanhoucke

, Rabinovich

, Going deeper with convolutions, In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, (2015), pp. 1–9.

21.

Wang

, Xu

, Cheng

, Xia

, Yin

, Wu

, Human action recognition by learning spatio-temporal features with deep neural networks, IEEE Access 6 (2018), 17913–17922.

22.

Tong

, Li

, Bai

, Ma

, Zhao

, DKD–DAD: a novel framework with discriminative kinematic descriptor and deep attention-pooled descriptor for action recognition, Neural Comput Appl (2019), 1–18.

23.

Han

, Zhang

, Zhuo

, Huang

, Zhang

, Going deeper with two-stream ConvNets for action recognition in video surveillance, Pattern Recogn Lett 107 (2018), 83–90.

24.

Jia

, Shelhamer

, Donahue

, Karayev

, Long

, Girshick

, Guadarrama

, Darrell

, Caffe:Convolutional architecture for fast feature embedding, In Proceedings of the 22nd ACM international conference on Multimedia, ACM, NewYork, USA, (2014), pp. 675–678.

25.

, Zhang

, Ren

, Sun

, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, (2015), pp. 1026–1034.

26.

Nikouei

S.Y.

, Chen

, Song

, Xu

, Choi

B.Y.

, Faughnan

T.R.

, Real-time human detection as an edge service enabled by a lightweight cnn, In Proceedings of the 2018 IEEE International Conference on Edge Computing (EDGE), San Francisco, CA, USA, (2018), pp. 125–129.

27.

Iandola

F.N.

, Han

, Moskewicz

M.W.

, Ashraf

, Dally

W.J.

, Keutzer

, SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360, (2016).

28.

Sun

, Yang

, Liu

M.Y.

, Kautz

, PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume, In Proceedings of the IEEE/CVF international conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, (2018), pp. 8934–8943.

29.

Hazan

, Saunders

D.J.

, Khan

, Sanghavi

D.T.

, Siegelmann

H.T.

, Kozma

, BindsNET: A machine learning-oriented spiking neural networks library in Python, Front Neuroinform 12, (2018), 89.

30.

Ilg

, Mayer

, Saikia

, Keuper

, Dosovitskiy

, Brox

, Flownet 2.0: Evolution of optical flow estimation with deep networks, In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, (2017), pp. 2462–2470.

31.

Bengio

, Lee

D.H.

, Bornschein

, Mesnard

, Lin

, Towards biologically plausible deep learning, arXiv preprint arXiv:1502.04156, (2015).

32.

Wang

, Ge

, Li

, Fang

, Three-stream CNNs for action recognition, Pattern Recogn Lett 92, (2017), 33–40.

33.

, Marturi

, Li

, Leonardis

, Stolkin

, Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos, Pattern Recognit 76, (2018), 506–521.

34.

, Qiu

, Yao

, Mei

, Rui

, Luo

, Action recognition by learning deep multi-granular spatio-temporal video representation, In Proceedings of the ACM Int Conf on Multimedia Retrieval (ICMR), New York, USA, (2016), 159–166.

35.

Sun

, Jia

, Yeung

D.Y.

, Shi

B.E.

, Human action recognition using factorized spatio-temporal convolutional networks, In Proceedings of the IEEE international conference on Computer Vision (ICCV), Santiago, Chile, (2015), 4597–4605.

36.

Bilen

, Fernando

, Gavves

, Vedaldi

, Action recognition with dynamic image networks, IEEE Trans Pattern Anal Mach Intell 40(12), (2017), 2799–2813.

37.

Majd

, Safabakhsh

, A motion-aware ConvLSTM network for action recognition, Appl Intell 49, (2019), 2515–2521.

38.

Wang

, Xiong

, Wang

, Qiao

, Lin

, Tang

, Van Gool

, Temporal segment networks: Towards good practices for deep action recognition, In Proceedings of the European Conference on Computer Vision(ECCV), Springer, Cham, (2016), pp. 20–36.

39.

Varol

, Laptev

, Schmid

, Long-term temporal convolutions for action recognition, IEEE Trans Pattern Anal Mach Intell 40(6), (2017), 1510–1517.

40.

Zare

, Moghaddam

H.A.

, Sharifi

, Video spatiotemporal mapping for human action recognition by convolutional neural network, Pattern Anal Appl, (2019), 1–15.

41.

Jhuang

, Serre

, Wolf

, Poggio

, A biologically inspired system for action recognition, In Proceedings of the international conference on Computer Vision (ICCV), (2007).

42.

Dhoble

, Nuntalid

, Indiveri

, Kasabov

, Online spatio-temporal pattern recognition with evolving spiking neural networks utilising address event representation, rank order, and temporal spike learning, In Proceedings of the IEEE international joint conference on Neural Networks (IJCNN), Brisbane, QLD, Australia, (2012), pp. 1–7.

43.

Meng

, Jin

, Yin

, Modeling activity-dependent plasticity in BCM spiking neural networks with application to human behavior recognition, IEEE Trans on Neural Netw 22(12) (2011), 1952–1966.

44.

Liu

, Shu

, Tang

, Zhang

, Computational model based on neural network of visual cortex for human action recognition, IEEE Trans Neural Netw Learn Syst 29(5) (2018), 1427–1440.

45.

Huang

, Liu

, Van Der Maaten

and Weinberger

K.Q.

, Densely connected convolutional networks, In Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 4700–4708.

46.

, Koltun

, Multi-scale context aggregation by dilated convolutions, arXiv preprint arXiv:1511.07122, (2015).

47.

Diehl

P.U.

, Cook

, Unsupervised learning of digit recognition using spike-timing-dependent plasticity, Frontiers in Comput Neurosci 9 (2015), 99.

48.

Lee

J.H.

, Delbruck

, Pfeiffer

, Training deep spiking neural networks using backpropagation, Front Neurosci 10 (2016), 508.

49.

Soomro

, Zamir

A.R.

, Shah

, UCF101: A dataset of 101 human actions classes from videos in the wild, arXiv preprint arXiv:1212.0402, (2012).

50.

Kuehne

, Jhuang

, Garrote

, Poggio

, Serre

, HMDB: a large video database for human motion recognition, In Proceedings of the IEEE international conference on Computer Vision, Barcelona, Spain, (2011), 2556–2563.

51.

Dalal

, Triggs

, Histograms of oriented gradients for human detection, In Proceedings of the IEEE Computer Society conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, (2005).

52.

Zhang

, Xin

, Wang

, Yang

, Zhang

, Wang

, End-to-end temporal attention extraction and human action recognition, Mach Vision Appl 29(7) (2018), 1127–1142.