Abstract
Non-motorized traffic lane participants crossing behavior during road crossings plays a critical role in both personal safety and overall traffic efficiency. This study proposes a language-model-based framework for analyzing crossing behaviors of non-motorized lane users. First, image preprocessing is performed using the YOLOv8 algorithm to accurately detect key body parts, including the head, hands, and legs. An optimized prompting strategy is then integrated with a visual large language model (LLM) to adapt it to pedestrian-related tasks. In addition, a chain-of-thought inference module is incorporated to strengthen the model’s reasoning ability, thereby improving behavior classification and risk assessment. Experimental results show that, under zero-shot learning, the proposed model improves accuracy by 11.74% compared with other LLM, and under few-shot learning, it surpasses traditional neural networks by 2.97%. These findings demonstrate that the method not only enhances the accuracy and robustness of pedestrian crossing behavior recognition but also maintains strong performance in data-scarce scenarios, offering valuable support for improving road safety.
Keywords
Get full access to this article
View all access options for this article.
