A computationally efficient method for Human Activity Recognition based on spatio temporal cuboid and Super Normal Vector

Abstract

For the last three decades human activity recognition has shown a huge technological advancement due to less expensive RGB-D cameras and the increase in the large volume of video data. As a result of the increase in number of surveillance cameras, manual annotation becomes difficult and need for automatic recognition and annotation of video arises. In this paper, we introduce a computationally and storage efficient method for recognizing human activities from depth videos and a new frame selection method based on the mean value of motion energy. We extract normal vectors from the points in the boundary curve. Then polynormals are obtained by sequentially attaching the normals from a neighborhood of each of the points in the boundary curve. These polynormals from a spatio-temporal cuboid constructed from the input video and it is pooled to form the Super Normal vectors. These Super Normal vectors are the final feature vectors, which are given as input to the classifier. The classifier used is lib-linear SVM. The results on MSRAction3D dataset show that the algorithm we put forward is fast and the accuracy obtained is comparable with the existing methods. The method which we proposed here gives an accuracy of 88% while taking whole frames and 89.82% when frame selection method is applied. The proposed method is also tested on UTD-MHAD dataset.

Keywords

Motion energy depth videos frame selection boundary curves polynormal dictionary learning

Get full access to this article

View all access options for this article.

References

Yang

and Tian

, Super normal vector for activity recognition using depth sequences, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 804–811, 2014.

Yao

, Liu

and Huang

, Spatio-temporal information for human action recognition, EURASIP Journal on Image and Video Processing2016(1) (2016), 39.

Gowda

S.N.

, Human activity recognition using combinatorial deep belief networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 1–6

Dubois

and Charpillet

, Human activities recognition with rgb-depth camera using hmm, in Engineering in Medicine and Biology Society (EMBC), 2013 35^th Annual International Conference of the IEEE. IEEE, 2013, pp. 4666–4669.

Ali

H.H.

, Moftah

H.M.

and Youssif

A.A.

, Depth-based human activity recognition: A comparative perspective study on feature extraction, Future Computting and Informatics Journal, 2017.

Xia

and Aggarwal

, Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, in Proceedings of the IEEE.

, Miao

, Zhang

X.-P.

and Tian

, A hierarchical spatio temporal model for human activity recognition, IEEE Transactions on Multimedia, 2017.

Oreifej

and Liu

, Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 716–723, 2013.

Yang

and Tian

, Super normal vector for human activity recognition with depth cameras, IEEE Transactions on Pattern Analysis and Machine Intelligence39(5) (2017), 1028–1039.

10.

Shi

, Li

, Sun

, Hu

and Yin

, P-snv: Pyramid-super normal vector descriptor for human action recognition based on depth sequences, Journal of Information & Computational Science12(18) (2015), 7061–7070.

11.

Rahmani

, Mahmood

, Huynh

D.Q.

and Mian

, Hopc: Histogram of oriented principal components of 3d pointclouds for action recognition, in European Conference on Computer Vision, pp. 742–757, Springer, 2014.

12.

, Zhang

and Liu

, Action recognition based on a bag of 3d points, in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 9–14, IEEE, 2010.

13.

Chen

, Liu

and Kehtarnavaz

, Real-time human action recognition based on depth motion maps, Journal of Real-time Image Processing12(1) (2016), 155–163.

14.

Yang

and Yang

, Dmm-pyramid based deep architectures for action recognition with depth cameras, in Asian Conference onComputerVision, pp. 37–49, Springer, 2014.

15.

Xie

, Bi

, Dong

and Jin

, Key frame extraction of skeleton joint based on kinect sensor, in 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress pp. 383–385, IEEE, 2018.

16.

Kamel

, Sheng

, Yang

, Li

, Shen

and Feng

D.D.

, Deep convolutional neural networks for human action recognition using depth maps and postures, IEEE Transactions on Systems, Man, and Cybernetics: Systems99 (2018), 1–14.

17.

Bulbul

M.F.

, Jiang

and Ma

, Human action recognition based on dmms, hogs and contourlet transform, in Multimedia Big Data (BigMM), 2015 IEEE International Conference on, pp. 389–394, IEEE, 2015.

18.

Liu

, Kong

and Wang

, Human activities recognition based on skeleton information via sparse representation, Journal of Computing Science and Engineering12(1) (2018), 1–11.

19.

Park

, Kim

, Baek

S.-Y.

and Lee

, An algorithm for estimating surface normal from its boundary curves, Journal of Computational Design and Engineering2(1) (2015), 67–72.

20.

Fan

R.-E.

, Chang

K.-W.

, Hsieh

C.-J.

, Wang

X.-R.

and Lin

C.-J.

, Liblinear: A library for large linear classification, Journal of Machine Learning Research9 (2008), 1871–1874.