Scalable local features and hybrid classifiers for improving action recognition

Abstract

In recent years, action recognition techniques have played an increasingly important role in autonomous systems. However, the computational costs and precision of action recognition algorithms are still major challenges. Recently, a deep learning approach was proposed to obtain a higher accuracy, but large and deep neural networks have high computational costs. This paper presents a new approach that allows for a significant reduction in computational time while slightly increasing the accuracy. The contribution consists of two parts: a scalable feature extraction method (SFE) and a hybrid model of different classifiers. First, the SFE method is proposed for application to histogram orientation-based feature descriptors, such as the histogram of orientated gradient (HOG), histogram of optical flow (HOF), and the motion boundary histogram (MBH). An advantage of SFE is its ability to quickly compute features. Scalable feature extraction enables accurate approximation of features extracted from traditional image pyramids by efficiently using only the original image. Our method is inspired by a special data structure used for storing basic information of optical flow and image gradients, which are computed from the original image and then used to extract features across multiple scales of the feature region without recomputing the image gradients and optical flow. Second, we focus on a hybrid classification method based on a linear support vector machine (SVM) and hidden conditional random field (HCRF) model that improves the recognition precision. This effort shows that a combination of SVM and HCRF models provides a better accuracy than the traditional approaches. Experimental results illustrate that the proposed approach allows for both a significant reduction in computational time and an improved accuracy.

Keywords

Action recognition hybrid classification local feature descriptor scalable feature extraction

Get full access to this article

View all access options for this article.

References

Arthur

, Vassilvitskii

, k-means++: The advantages of careful seeding, in: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, 2007, pp. 1027–1035.

Ballas

, Yang

, Lan

Z.Z.

, Delezoide

, Prêteux

, Hauptmann

, Space-time robust representation for action recognition, in: Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2704–2711.

Chou

H.C.

, Remote monitoring and control smart floor for detecting falls and wandering patterns in people with dementia, Journal of Intelligent & Fuzzy Systems, 1–9.

Farnebäck

, Two-frame motion estimation based on polynomial expansion, Image Analysis (2003), 363–370.

Feichtenhofer

, Pinz

, Wildes

R.P.

, Spatiotemporal multiplier networks for video action recognition, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 7445–7454.

Gaidon

, Harchaoui

, Schmid

, Activity representation with motion hierarchies, International Journal of Computer Vision 107 (2014), 219–238.

González

, Sedano

, Villar

J.R.

, Corchado

, Herrero

Á.

, Baruque

, Features and models for human activity recognition, Neurocomputing 167 (2015), 52–60.

Hoang

V.D.

, Multiple classifier-based spatiotemporal features for living activity prediction, Journal of Information and Telecommunication 1 (2017), 100–112.

Hoang

V.D.

, Jo

K.H.

, A simplified solution to motion estimation using an omnidirectional camera and a 2-d lrf sensor, IEEE Transactions on Industrial Informatics 12 (2016), 1064–1073.

10.

Hoang

V.D.

, Le

M.H.

, Jo

K.H.

, Motion estimation based on two corresponding points and angular deviation optimization, IEEE Transactions on Industrial Electronics 64 (2017), 8598–8606.

11.

Karaman

, Seidenari

, Bagdanov

A.D.

, Del Bimbo

, L1-regularized logistic regression stacking and transductive CRF smoothing for action recognition in video, in ICCV Workshop on Action Recognition with a Large Number of Classes, 2013, pp. 1–6.

12.

Kong

, Fu

, Discriminative relational representation learning for RGB-D action recognition, IEEE Transactions on Image Processing 25 (2016), 2856–2865.

13.

Kuehne

, Jhuang

, Stiefelhagen

, Serre

, HMDB51: A large video database for human motion recognition, in: High Performance Computing in Science and Engineering, Springer, 2013,pp. 571–582.

14.

Mannan

, Javed

, Noon

S.K.

, Babri

H.A.

, et al., Optimized segmentation and multiscale emphasized feature extraction for traffic sign detection and recognition, Journal of Intelligent & Fuzzy Systems, 1–16.

15.

Muja

, Lowe

D.G.

, Fast approximate nearest neighbors with automatic algorithm configuration, VISAPP (1) 2 (2009), 2.

16.

Niebles

J.C.

, Chen

C.W.

, Fei-Fei

, Modeling temporal structure of decomposable motion segments for activity classification, in: European Conference on Computer Vision, Springer, 2010, pp. 392–405.

17.

Peng

, Wang

, Cai

, Qiao

, Peng

, Hybrid super vector with improved dense trajectories for action recognition, in: ICCV2013 Workshops, Citeseer, 2013, pp. 109–125.

18.

Seo

J.J.

, Kim

H.I.

, De Neve

and Ro

Y.M.

, Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection, Image and Vision Computing 58 (2017), 76–85.

19.

Soomro

, Zamir

A.R.

, Shah

, UCF101: Dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.

20.

Szegedy

, Liu

, Jia

, Sermanet

, Reed

, Anguelov

, Erhan

, Vanhoucke

, Rabinovich

, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.

21.

, Xie

, Dauwels

, Li

, Yuan

, Semantic cues enhanced multi-modality multi-stream cnn for action recognition, IEEE Transactions on Circuits and Systems for Video Technology (2018).

22.

Vishwakarma

D.K.

, Kapoor

, Hybrid classifier based human activity recognition using the silhouette and cells, Expert Systems with Applications 42 (2015), 6957–6965.

23.

Wang

, Kläser

, Schmid

, Liu

C.L.

, Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103 (2013 a), 60–79.

24.

Wang

, Oneata

, Verbeek

, Schmid

, A robust and efficient video representation for action recognition, International Journal of Computer Vision 119 (2016 a), 219–238.

25.

Wang

, Li

, Fang

, Gradient-layer feature transform for action detection and recognition, Journal of Visual Communication and Image Representation 40 (2016 b), 159–167.

26.

Wang

, Qiao

, Tang

, Mining motion atoms and phrases for complex action recognition, in: Proceedings of the IEEE International Conference on Computer Vision, 2013 b, pp. 2680–2687.

27.

Wang

, Gao

, Wang

, Sun

, Liu

, Twostream 3-d convnet fusion for action recognition in videos with arbitrary size and length, IEEE Transactions on Multimedia 20 (2018), 634–644.

28.

Weng

J.F.

, Su

K.L.

, Development of a slam based automated guided vehicle, Journal of Intelligent & Fuzzy Systems, 1–13. DOI: 10.3233/JIFS-169897

29.

, Tian

, Wang

, Wu

, A joint evaluation of different dimensionality reduction techniques, fusion and learning methods for action recognition, Neurocomputing 214 (2016), 329–339.

30.

, Lin

, Human action recognition with graphbased multiple-instance learning, Pattern Recognition 53 (2016), 148–162.

31.

Zhang

, Li

, Ogunbona

P.O.

, Wang

, Tang

, RGB-D-based action recognition datasets: A survey, Pattern Recognition 60 (2016), 86–105.

32.

Ziaeefard

, Bergevin

, Semantic human activity recognition: A literature review, Pattern Recognition 48 (2015), 2329–2345.