Abstract
Generally, emotions in short videos are transmitted through characters behaviour and text content in the video. The mainstream recognition model is based on a convolutional neural network. However, its processing ability for video data is limited, its recognition accuracy is not high enough, and it is limited in processing complex text. To solve these problems, a new text recognition emotion classification model is designed by improving the text feature fusion module on the basis of the convolutional neural network model. Moreover, a new human behaviour recognition emotion classification model is designed by introducing multi-head attention mechanisms on the basis of a 3D convolutional neural network model. These results confirmed that the accuracy of the improved text recognition model was around 75%, while the original convolutional model's average accuracy was only about 67%. The 3D convolutional model under the multi-head attention mechanism had the highest recognition accuracy, with a recognition accuracy of 94.5% on the UCF101 database, which was 12 to 39 percentage points higher than the model under other attention mechanisms. The improved convolutional network model for short video text classification and behaviour recognition has more advantages than traditional models. These research results have certain value for classification models and can serve as technical references.
Keywords
Get full access to this article
View all access options for this article.
