Abstract
For the last three decades human activity recognition has shown a huge technological advancement due to less expensive RGB-D cameras and the increase in the large volume of video data. As a result of the increase in number of surveillance cameras, manual annotation becomes difficult and need for automatic recognition and annotation of video arises. In this paper, we introduce a computationally and storage efficient method for recognizing human activities from depth videos and a new frame selection method based on the mean value of motion energy. We extract normal vectors from the points in the boundary curve. Then polynormals are obtained by sequentially attaching the normals from a neighborhood of each of the points in the boundary curve. These polynormals from a spatio-temporal cuboid constructed from the input video and it is pooled to form the Super Normal vectors. These Super Normal vectors are the final feature vectors, which are given as input to the classifier. The classifier used is lib-linear SVM. The results on MSRAction3D dataset show that the algorithm we put forward is fast and the accuracy obtained is comparable with the existing methods. The method which we proposed here gives an accuracy of 88% while taking whole frames and 89.82% when frame selection method is applied. The proposed method is also tested on UTD-MHAD dataset.
Get full access to this article
View all access options for this article.
