Abstract
With the rapid advancement of large language models (LLMs) and computer vision technologies, multimodal large language models (MLLMs) have demonstrated remarkable potential in sentiment analysis. Traditional sentiment analysis methods often rely on unimodal data (e.g., text or images), making it difficult to comprehensively capture complex emotional expressions. This paper proposes a multimodal sentiment analysis framework based on MLLMs, integrating both visual and textual information to enhance sentiment classification accuracy. Experiments on a high-profile social media event show that Qwen2-VL-Adpter model outperforms conventional methods in multiple evaluation metrics, validating the effectiveness of multimodal information fusion. This study provides a robust technical framework for sentiment analysis in public opinion monitoring and offers valuable data support for crisis management. However, the model’s performance is influenced by the specificity of the dataset and computational demands, which may limit its application in resource-constrained environments.
Get full access to this article
View all access options for this article.
