CueTrack: Weak-Cue-Enhanced and Consistency-Aligned Framework for Robust Multi-Pedestrian Tracking

Abstract

The rapid advancement of urbanization and the growing demand for public safety present a strong impetus for multi-pedestrian tracking in surveillance systems. However, multi-pedestrian tracking still encounters several critical challenges: (1) the complexity of occluded or densely packed targets; (2) for targets exhibiting significant foreground–background contrast and subtle appearance features under occlusion, detection performance still encounters substantial challenges; and (3) when targets are occluded or crowded, there is often a high degree of overlap between objects, leading to the degradation of both spatial and appearance features, which increases the difficulty of maintaining identity consistency across frames. To cope with these challenges, we propose an enhanced tracking-by-detection framework, CueTrack, which integrates a novel detection module with a linear deformable convolution (LDConv) and high-resolution detection layer, together termed FlexDet, and introduces a confidence-based modeling strategy for more robust target association. In particular, unlike existing methods that rely solely on spatial or visual cues, our confidence-based approach adaptively compensates for the blurriness caused by frequent occlusion and crowded scenes. Extensive experiments conducted on the challenging MOT17 and MOT20 datasets have demonstrated the effectiveness of the proposed CueTrack, achieving 80.5 multi-object tracking accuracy (MOTA), 81.4 Identification F1-score (IDF1), and 65.2 higher-order tracking accuracy (HOTA) on the MOT17 dataset. This not only validates its superiority in detection accuracy and identity association, but also highlights its potential for real-world applications.

Keywords

Multi-Object Tracking Multi-Pedestrian Tracking Object Detection Data Association Occlusion Recovery

Get full access to this article

View all access options for this article.

References

Jiang

Huynh

D. Q.

Multiple Pedestrian Tracking from Monocular Videos in an Interacting Multiple Model Framework. IEEE Transactions on Image Processing, Vol. 27, No. 3, 2017, pp. 1361–1375.

Liu

Anguelov

Erhan

Szegedy

Reed

C.-Y.

Berg

A. C.

SSD: Single Shot Multibox Detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, Springer, Cham, 2016, pp. 21–37.

Bochkovskiy

Wang

C.-Y.

Liao

H.-Y. M.

YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934, 2020.

Liu

Wang

Sun

YOLOX: Exceeding YOLO Series in 2021. arXiv preprint arXiv:2107.08430, 2021.

Huang

Zhang

Lin

R.-S.

Han

Zeng

DiffMOT: A Real-Time Diffusion-Based Multiple Object Tracker with Non-Linear Prediction. arXiv preprint arXiv:2403.02075, 2024.

Guo

Wang

Sun

Liu

Wang

Pedestrian Multi-Object Tracking Combining Appearance and Spatial Characteristics. Expert Systems with Applications, Vol. 272, 2025, p. 126772.

Chen

Deng

MAML MOT: Multiple Object Tracking Based on Meta-Learning. In Proc., IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, Kuching, Malaysia, 2024.

Zhang

Sun

Jiang

Wang

Yuan

Luo

Liu

Wang

ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, October 23–27, 2022, Proceedings, Part XXII, Springer, Cham, 2022, pp. 1–21.

Aharon

Orfaig

Bobrovsky

B. Z.

BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv preprint arXiv:2206.14651, 2022.

10.

Dai

Xiong

Zhang

Wei

Deformable Convolutional Networks. In Proc., IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, 2017, pp. 764–773.

11.

Zhang

Yang

Dynamic Snake Convolution Based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proc., IEEE/CVF International Conference on Computer Vision (CVPR), IEEE, Piscataway, NJ, June 18–22, 2023, pp. 6070–6079.

12.

Wang

She

Zhu

Zhang

Chen

Involution: Inverting the Inherence of Convolution for Visual Recognition. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Piscataway, NJ, June 19–25, 2021, pp. 12321–12330.

13.

Chen

Dai

Liu

Chen

Yuan

Liu

Dynamic Convolution: Attention over Convolution Kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Piscataway, NJ, June 14–19, 2020, pp. 11030–11039.

14.

Bewley

Ott

Ramos

Upcroft

Simple Online and Realtime Tracking. In 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, Arizona, USA, September 25–28, 2016, IEEE, Piscataway, NJ, 2016, pp. 3464–3468.

15.

Wojke

Bewley

Paulus

Simple Online and Realtime Tracking with a Deep Association Metric. In 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, September 17–20, 2017, IEEE, Piscataway, NJ, 2017, pp. 3645–3649.

16.

Wang

Zheng

Liu

Wang

Towards Real-Time Multi-Object Tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, August 23–28, 2020, Proceedings, Part XI, Springer, Cham, 2020, pp. 107–122.

17.

Zhang

Wang

Zeng

Liu

Fairmot: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. International Journal of Computer Vision, Vol. 129, 2021, pp. 3069–3087.

18.

Liang

Zhang

Zhou

Zhu

Rethinking the Competition Between Detection and ReID in Multi-Object Tracking. IEEE Transactions on Image Processing, Vol. 31, 2022, pp. 3182–3196.

19.

Pang

Qiu

Chen

Darrell

Quasi-Dense Similarity Learning for Multiple Object Tracking. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, June 19–25, 2021, IEEE, Piscataway, NJ, 2021, pp. 164–173.

20.

Yang

Agam

Motion-Aware Transformer for Multi-Object Tracking. arXiv preprint arXiv:2509.21715, 2025.

21.

Luo

Shi

Teng

Duan

Huang

Wang

Yang

Omnidirectional Multi-Object Tracking. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, June 11–15, 2025, IEEE, 2025, Piscataway, NJ, pp. 21959–21969.

22.

Zhou

Koltun

Krähenbühl

Tracking Objects as Points. In European Conference on Computer Vision, Glasgow, UK, August 23–28, 2020, Springer, Cham, 2020, pp. 474–490.

23.

Bernardin

Stiefelhagen

Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. EURASIP Journal on Image and Video Processing, Vol. 2008, 2008, pp. 1–10.

24.

Ristani

Solera

Zou

Cucchiara

Tomasi

Performance Measures and a Data Set for Multi-Target, Multicamera Tracking. In ECCV, Amsterdam, The Netherlands, October 8-10 and 15–16, 2016, Proceedings, Part II, Springer, 2016, pp. 17–35.

25.

Luiten

Osep

Dendorfer

Torr

Geiger

Leal-Taixé

Leibe

Hota: A Higher Order Metric for Evaluating Multi-Object Tracking. International Journal of Computer Vision, Vol. 129, No. 2, 2021, pp. 548–578.

26.

Stadler

Beyerer

Modelling Ambiguous Assignments for Multi-Person Tracking in Crowds. In Proc., IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, Hawaii, USA, January 4–8, 2022, IEEE, Piscataway, NJ, 2022, pp. 133–142.

27.

Bottou

Stochastic Gradient Descent Tricks. Springer, Cham, 2012, pp. 421–436.

28.

Zhang

Sun

Jiang

Yuan

Luo

Liu

Wang

Bytetrack: Multi-Object Tracking by Associating Every Detection Box. arXiv preprint arXiv:2110.06864, 2021.

29.

Pang

Zhang

TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, June 14–19, 2020, IEEE, Piscataway, NJ, 2020, pp. 6308–6318.

30.

Liu

Shi

Yan

Poi: Multiple Object Tracking with High Performance Detection and Appearance Feature. In ECCV, Amsterdam, The Netherlands, October 11–14, 2016, Springer, Cham, 2016, pp. 36–42.

31.

Peng

Wang

Wan

Wang

Tai

Wang

Huang

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking. In European Conference on Computer Vision, Glasgow, UK, August 23–28, 2020, Springer, Cham, 2020, pp. 145–161.

32.

Ren

Girshick

Sun

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 2015, pp. 91–99.

33.

Cao

Song

Wang

Yang

Yuan

Track to Detect and Segment: An Online Multi-Object Tracker. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12352–12361.

34.

Zheng

Tang

Chen

Zhu

Wang

Improving Multiple Object Tracking with Single Object Tracking. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2453–2462.

35.

Ban

Delorme

Gan

Rus

AlamedaPineda

Transcenter: Transformers with Dense Queries for Multiple-Object Tracking. arXiv preprint arXiv:2103.15145, 2021.

36.

Han

Wang

RelationTrack: Relation-Aware Multiple Object Tracking with Decoupled Representation. arXiv preprint arXiv:2105.04322, 2021.

37.

Tokmakov

Burgard

Gaidon

Learning to Track with Object Permanence. arXiv preprint arXiv:2103.14258, 2021.

38.

Liang

Zhang

Zhou

Zou

Rethinking the Competition Between Detection and ReID in Multi-Object Tracking. arXiv preprint arXiv:2010.12138, 2020.

39.

Sun

Jiang

Zhang

Xie

Cao

Kong

Yuan

Wang

Luo

Transtrack: Multiple-Object Tracking with Transformer. arXiv preprint arXiv:2012.15460, 2020.

40.

Wang

Zheng

Pan

Multiple Object Tracking with Correlation Learning. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, June 19–25, 2021, IEEE, Piscataway, NJ, 2021, pp. 3876–3886.

41.

Liang

Zhang

Zhou

One More Check: Making “Fake Background” Be Tracked Again. arXiv preprint arXiv:2104.09441, 2021.

42.

Yang

Chang

Sakti

Nakamura

ReMOT: A Model-Agnostic Refinement for Multiple Object Tracking. Image and Vision Computing, Vol. 106, 2021, p. 104091.

43.

Cao

Weng

Khirodkar

Pang

Kitani

Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv preprint arXiv:2203.14360, 2022.

44.

Zhao

Song

Zhao

Gong

Meng

StrongSORT: Make DeepSORT Great Again. IEEE Transactions on Multimedia, 2023.

45.

Xiong

Yang

Wang

Xia

Semi-TCL: Semi-Supervised Track Contrastive Representation Learning. arXiv preprint arXiv:2107.02396, 2021.

46.

Schwarz

Miller

GSDT: An Integrative Model of Visual Search. Journal of Experimental Psychology: Human Perception and Performance, Vol. 42, No. 10, 2016, p. 1654.

47.

Shuai

Berneshawi

Modolo

Tighe

SiamMOT: Siamese Multi-Object Tracking. In Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, June 19–25, 2021, IEEE, Piscataway, NJ, 2021, pp. 12372–12382.

48.

Shao

Zhao

Xiao

Zhang

Sun

Crowdhuman: A Benchmark for Detecting Human in a Crowd. Cornell University—arXiv, 2018.

49.

Luo

Xing

Milan

Zhang

Liu

Kim

T.-K.

Multiple Object Tracking: A Literature Review. Artificial Intelligence, Vol. 293, 2021, p. 103448.

50.

Zeng

Wang

Jin

Complete and Accurate Holly Fruits Counting Using YOLOX Object Detection. Computers and Electronics in Agriculture, Vol. 198, 2022, p. 107062.

51.

Chen

Bai

Chen

Liu

Wang

An Improved YOLOX Model for Detecting Strip Surface Defects. Steel Research International, Vol. 93, No. 11, 2022, p. 2200505.