Abstract
Existing short video content analysis methods often struggle with integrating multimodal data, resulting in incomplete content understanding and limited recommendation accuracy. To address these challenges, this study proposes a deep learning-based short video content analysis and recommendation algorithm that enhances feature extraction and personalization. CNN and LSTM are employed to automatically capture and extract multimodal features from short videos, generating a comprehensive feature representation through weighted fusion. Meanwhile, DNN is utilized to model user behavior, extracting deep behavioral features to enhance preference prediction. By jointly modeling video content and user behavior, a multimodal recommendation algorithm is developed to optimize content delivery. Experimental results show that, compared to CF, MF, and Content-Based Filtering benchmark algorithms, the proposed method improves average precision by 4.7%, 3.3%, and 4.0%, and recall by 3.5%, 1.2%, and 2.1%, respectively. These findings confirm that the deep learning-driven approach effectively enhances multimodal content understanding, meets personalized user preferences, and significantly improves recommendation accuracy, offering a more intelligent and adaptive short video recommendation framework.
Get full access to this article
View all access options for this article.
