Abstract
The rapid advancement of urbanization and the growing demand for public safety present a strong impetus for multi-pedestrian tracking in surveillance systems. However, multi-pedestrian tracking still encounters several critical challenges: (1) the complexity of occluded or densely packed targets; (2) for targets exhibiting significant foreground–background contrast and subtle appearance features under occlusion, detection performance still encounters substantial challenges; and (3) when targets are occluded or crowded, there is often a high degree of overlap between objects, leading to the degradation of both spatial and appearance features, which increases the difficulty of maintaining identity consistency across frames. To cope with these challenges, we propose an enhanced tracking-by-detection framework, CueTrack, which integrates a novel detection module with a linear deformable convolution (LDConv) and high-resolution detection layer, together termed FlexDet, and introduces a confidence-based modeling strategy for more robust target association. In particular, unlike existing methods that rely solely on spatial or visual cues, our confidence-based approach adaptively compensates for the blurriness caused by frequent occlusion and crowded scenes. Extensive experiments conducted on the challenging MOT17 and MOT20 datasets have demonstrated the effectiveness of the proposed CueTrack, achieving 80.5 multi-object tracking accuracy (MOTA), 81.4 Identification F1-score (IDF1), and 65.2 higher-order tracking accuracy (HOTA) on the MOT17 dataset. This not only validates its superiority in detection accuracy and identity association, but also highlights its potential for real-world applications.
Keywords
Get full access to this article
View all access options for this article.
