Abstract
Current short video recommendation systems face challenges in accurately identifying user interests due to large-scale, heterogeneous user data, as well as addressing cold-start problems when encountering new users and content. This study proposes a deep learning and reinforcement learning (DLRL) framework to enhance recommendation performance. User and video data are collected and preprocessed using the Hadoop platform, while automated metadata extraction tools, optical character recognition (OCR), and audio-to-text technologies extract multimodal features, including subtitles and audio content. A convolutional neural network (CNN) extracts semantic and visual features from text-image data, while a long short-term memory (LSTM) network captures temporal dependencies in user behavior to detect interest changes. A multimodal attention mechanism integrates these diverse features to form comprehensive user portraits, and a deep neural network (DNN) reduces feature dimensionality to represent user preferences accurately. The recommendation process is formulated as a reinforcement learning task optimized through Deep Q Network (DQN) and Proximal Policy Optimization (PPO) algorithms. Experimental results demonstrate that the proposed approach achieves superior click-through rates (6.22%), viewing duration (73.31 s), accuracy (0.877), and recall (0.858), effectively overcoming cold-start limitations and dynamically adapting to evolving user interests. This shows that combining DLRL can optimize the performance of short video recommendations, improve accuracy and experience, and address the issues of cold start and capturing changes in user interests.
Keywords
Get full access to this article
View all access options for this article.
