Abstract
In martial arts action recognition, complex poses, occlusions, and dynamic changes can lead to insufficient estimation accuracy, and traditional methods suffer from inaccurate joint localization and poor recognition of continuous pose jumps. In response to this, the research proposes the Dilated Convolution-Attention-Stacked Hourglass Network Pose Estimation (MS-DConv-Att-SHN) to achieve martial arts action recognition. Firstly, multiple dilated convolutions and mixed attention are proposed to improve the computational complexity and loss of joint information in stacked hourglass networks, enhancing the ability to capture subtle joint displacements. This is crucial for accurate estimation of complex poses such as movement, flicker, and virtual real transformations in martial arts. Secondly, in response to the strong coherence of martial arts movements, local feature refinement and channel fusion techniques are used to enhance the correlation analysis between consecutive action frames, solving the problem of traditional methods’ fragmented recognition of action chains such as “exertion contraction.” A martial arts action recognition system based on the improved MS-DConv-Att-SHN method has been developed to better identify individual movements and capture the intrinsic relationships between movements in routines. This provides key technical support for the digital inheritance, intelligent evaluation, and standardized promotion of martial arts, making it more closely aligned with the movement characteristics of martial arts that combine form and spirit. The results indicate that structural improvements to the stacked hourglass network can effectively increase its percentage of correct keypoints (PCK) and mean average precision (mAP) for both datasets, with PCK and mAP values exceeding 92% and 85%, respectively. The average recognition accuracy of attitude keypoints in the MS-DConv-Att-SHN model is superior to other comparison models, with a difference of over 1.2% compared to other models. The improved MS-DConv-Att-SHN model achieves recognition accuracy of over 90% for different martial arts movements, showing smaller parameter counts and PCK values compared to other comparative models. The research method can effectively provide technical support for the automation analysis of martial arts movements, sports training assistance systems, and intelligent martial arts teaching.
Keywords
Get full access to this article
View all access options for this article.
