TPT-Bench: A large-scale,long-term and robot-egocentric dataset for benchmarking target person tracking

Abstract

Tracking a target person from robot-egocentric views is crucial for developing autonomous robots that provide continuous personalized assistance or collaboration in human–robot interaction (HRI) and Embodied AI. However, most existing target person tracking (TPT) benchmarks are limited to controlled laboratory environments with few distractions, clean backgrounds, and short-term occlusions. In this paper, we introduce a large-scale dataset designed for TPT in crowded and unstructured environments, demonstrated through a robot-person following task. The dataset is collected by a human pushing a sensor-equipped cart while following a target person, capturing human-like following behavior and emphasizing long-term tracking challenges, including frequent occlusions and the need for re-identification from numerous pedestrians. It includes multi-modal data streams, including odometry, 3D LiDAR, IMU, panoramic images, and RGB-D images, along with exhaustively annotated 2D bounding boxes of the target person across 48 sequences, both indoors and outdoors. Using this dataset and visual annotations, we perform extensive experiments with existing SOTA TPT methods, offering a thorough analysis of their limitations and suggesting future research directions. Our dataset, code, and video are available at https://medlartea.github.io/tpt-bench/.

Keywords

applications and data papers data sets for robotic vision human–robot interaction multi-modal perception for HRI robot companions vision and sensor-based control visual perception and learning visual tracking

Get full access to this article

View all access options for this article.

References

Arechavaleta

Laumond

Hicheur

, et al. (2008) On the nonholonomic nature of human locomotion. Autonomous Robots 25(1): 25–35. https://doi.org/10.1007/s10514-007-9075-2

Bhat

Danelljan

Gool

, et al. (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6182–6191.

Čehovin

Leonardis

Kristan

(2016) Visual object tracking performance measures revisited. IEEE Transactions on Image Processing 25(3): 1261–1274. https://doi.org/10.1109/TIP.2016.2520370

Chaudhry

Rohrbach

Elhoseiny

, et al. (2019) Continual learning with tiny episodic memories. In: International Conference on Machine Learning PMLR.

Chen

Sahdev

Tsotsos

(2017a) Integrating stereo vision with a cnn tracker for a person-following robot. In: International Conference on Computer Vision Systems (ICVS). Springer, pp. 300–313.

Chen

Sahdev

Tsotsos

(2017b) Person following robot using selected online ada-boosting with stereo camera. In: Conference on Computer and Robot Vision (CRV), pp. 48–55. https://doi.org/10.1109/CRV.2017.55

Cui

Jiang

Wang

, et al. (2022) Mixformer: End-to-end tracking with iterative mixed attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 13608–13618.

Cui

Jiang

, et al. (2024) Mixformer: End-to-end tracking with iterative mixed attention. In: IEEE Transactions on Pattern Analysis and Machine Intelligence.

Dai

Zhang

Wang

, et al. (2020) High-performance long-term tracking with meta-updater. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6298–6307.

10.

Dendorfer

(2020) Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003.

11.

Eirale

Martini

Chiaberge

(2025) Human following and guidance by autonomous mobile robots: a comprehensive review. IEEE Access.

12.

Eisenbach

Lübberstedt

Aganian

, et al. (2023) A little bit attention is all you need for person re-identification. In: Proceedings of IEEE International Conference on Robotics and Automation. IEEE, pp. 7598–7605.

13.

Fan

Bai

Lin

, et al. (2021) Lasot: a high-quality large-scale single object tracking benchmark. International Journal of Computer Vision 129: 439–461. https://doi.org/10.1007/s11263-020-01387-y

14.

Francis

Pérez-d’Arpino

, et al. (2025) Principles and guidelines for evaluating social robot navigation algorithms. ACM Transactions on Human-Robot Interaction 14(2): 1–65. https://doi.org/10.1145/3700599

15.

(2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.

16.

Goldhoorn

Garrell

Alquézar

, et al. (2014) Continuous real time pomcp to find-and-follow people by a humanoid service robot IEEE International Conference on Humanoid Robots. IEEE, pp. 741–747.

17.

Gross

Scheidig

Debes

, et al. (2017) Roreas: robot coach for walking and orientation training in clinical post-stroke rehabilitation—prototype implementation and evaluation in field trials. Autonomous Robots 41: 679–698. https://doi.org/10.1007/s10514-016-9552-6

18.

Honig

Oron-Gilad

Zaichyk

, et al. (2018) Toward socially aware person-following robots. IEEE Transactions on Cognitive and Developmental Systems 10(4): 936–954. https://doi.org/10.1109/tcds.2018.2825641

19.

Wang

(2013) Design of sensing system and anticipative behavior for human following of mobile robots. IEEE Transactions on Industrial Electronics 61(4): 1916–1927. https://doi.org/10.1109/tie.2013.2262758

20.

Huang

Zhao

Huang

(2020) Globaltrack: a simple and strong baseline for long-term tracking. AAAI Conference on Artificial Intelligence 34: 11037–11044. https://doi.org/10.1609/aaai.v34i07.6758

21.

Islam

Hong

Sattar

(2019) Person-following by autonomous robots: a categorical overview. International Journal of Robotics Research 38(14): 1581–1618. https://doi.org/10.1177/0278364919881683

22.

Javed

Danelljan

Khan

, et al. (2022) Visual object tracking with discriminative filters and siamese networks: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(5): 6552–6574. https://doi.org/10.1109/TPAMI.2022.3212594

23.

Karnan

Nair

Xiao

, et al. (2022) Socially compliant navigation dataset (scand): a large-scale dataset of demonstrations for social navigation. IEEE Robotics and Automation Letters 7(4): 11807–11814. https://doi.org/10.1109/lra.2022.3184025

24.

Karunarathne

Morales

Kanda

, et al. (2018) Model of side-by-side walking without the robot knowing the goal. International Journal of Social Robotics 10: 401–420. https://doi.org/10.1007/s12369-017-0443-6

25.

Koide

Miura

Menegatti

(2020) Monocular person tracking and identification with on-line deep feature selection for person following robots. Robotics and Autonomous Systems 124: 103348. https://doi.org/10.1016/j.robot.2019.103348

26.

Leigh

Pineau

Olmedo

, et al. (2015) Person tracking and following with 2d laser scanners. In:Proceedings of IEEE International Conference on Robotics and Automation, pp. 726–733. https://doi.org/10.1109/ICRA.2015.7139259

27.

Wang

, et al. (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4282–4291.

28.

Liu

Feng

Xue

, et al. (2024) Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437.

29.

Lukeźič

Zajc

LČ

Vojíř

, et al. (2020) Performance evaluation methodology for long-term single-object tracking. IEEE Transactions on Cybernetics 51(12): 6305–6318. https://doi.org/10.1109/TCYB.2020.2980618

30.

Luo

Xing

Milan

, et al. (2021) Multiple object tracking: a literature review. Artificial Intelligence 293: 103448. https://doi.org/10.1016/j.artint.2020.103448

31.

Mai

Jeong

, et al. (2022) Online continual learning in image classification: an empirical survey. Neurocomputing 469: 28–51. https://doi.org/10.1016/j.neucom.2021.10.021

32.

Manen

Gygli

Dai

, et al. (2017) Pathtrack: fast trajectory annotation with path supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 290–299.

33.

Martin-Martin

Patel

Rezatofighi

, et al. (2021) Jrdb: a dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(6): 6748–6765. https://doi.org/10.1109/TPAMI.2021.3070543

34.

Mayer

Danelljan

Paudel

, et al. (2021) Learning target candidate association to keep track of what not to track. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 13444–13454.

35.

Mayer

Danelljan

Bhat

, et al. (2022) Transforming model prediction for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8731–8740.

36.

Mayer

Danelljan

Yang

, et al. (2024) Beyond sot: tracking multiple generic objects at once. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 6826–6836.

37.

Miao

Liu

, et al. (2019) Pose-guided feature alignment for occluded person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 542–551.

38.

Milan

(2016) Mot16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.

39.

Mišeikis

Caroni

Duchamp

, et al. (2020) Lio-a personal robot assistant for human-robot interaction and care applications. IEEE Robot. Autom. Lett 5(4): 5339–5346. https://doi.org/10.1109/LRA.2020.3007462

40.

Morales

Miyashita

Hagita

(2017) Social robotic wheelchair centered on passenger and pedestrian comfort. Robotics and Autonomous Systems 87: 355–362. https://doi.org/10.1016/j.robot.2016.09.010

41.

Muller

Bibi

Giancola

, et al. (2018) Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. Proceedings of the European Conference on Computer Vision, 300–317.

42.

Nguyen

Nazeri

Payandeh

, et al. (2023) Toward human-like social robot navigation: a large-scale, multi-modal, social human navigation dataset. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 7442–7447.

43.

OpenAI (2025) Gpt-4o: a large-scale language model. URL. https://openai.com/research/gpt-4o

44.

Piaggio-Fast-Forward (2025) Person following by Gita. https://piaggiofastforward.com/ (Accessed 11 11 2025).

45.

Ravi

Gabeur

, et al. (2024) Sam 2: Segment anything in Images and Videos. arXiv preprint arXiv:2408.00714.

46.

Repiso

Garrell

Sanfeliu

(2018) Robot approaching and engaging people in a human-robot companion framework. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 8200–8205.

47.

Repiso

Garrell

Sanfeliu

(2020a) Adaptive side-by-side social robot navigation to approach and interact with people. Int. J. Social Robot. (IJSR) 12(4): 909–930. https://doi.org/10.1007/s12369-019-00559-2

48.

Repiso

Garrell

Sanfeliu

(2020b) People’s adaptive side-by-side model evolved to accompany groups of people by social robots. IEEE Robot. Autom. Lett 5(2): 2387–2394. https://doi.org/10.1109/lra.2020.2970676

49.

Rollo

Zunino

Raiola

, et al. (2023) Followme: a robust person following framework based on visual re-identification and gestures. In: 2023 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO). IEEE, pp. 84–89.

50.

Rollo

Zunino

Tsagarakis

, et al. (2024) Continuous adaptation in person re-identification for robotic assistance. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 425–431. https://doi.org/10.1109/ICRA57147.2024.10611226

51.

Singamaneni

Bachiller-Burgos

Manso

, et al. (2024) A survey on socially aware robot navigation: taxonomy and future challenges. International Journal of Robotics Research 43(10): 02783649241230562.

52.

Siva

Zhang

(2022) Robot perceptual adaptation to environment changes for long-term human teammate following. International Journal of Robotics Research 41(7): 706–720. https://doi.org/10.1177/0278364919896625

53.

Somers

Alahi

Vleeschouwer

(2025) Keypoint promptable re-identification. In: Proceedings of the European Conference on Computer Vision. Springer, 216–233.

54.

Tang

Liang

Grauman

, et al. (2024) Egotracks: a long-term egocentric visual object tracking dataset. Advances in Neural Information Processing Systems 36: 75716–75739.

55.

Triebel

Arras

Alami

, et al. (2016) Spencer: a socially aware service robot for passenger guidance and help in busy airports. In: Proceedings of the 10th International Conference on Field and Service Robotics. Springer, pp. 607–622.

56.

Voigtlaender

Luiten

Torr

, et al. (2020) Siam r-cnn: visual tracking by re-detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6578–6588.

57.

Wang

Zhang

, et al. (2025) Trackvla: embodied visual tracking in the wild. Arxiv preprint arxiv:2505.23189.

58.

Wei

Jiao

, et al. (2024) Fusionportablev2: a unified multi-sensor dataset for generalized slam across diverse platforms and scalable environments. International Journal of Robotics Research 44(7): 02783649241303525.

59.

Lim

Yang

(2013) Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

60.

Yan

Peng

, et al. (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10448–10457.

61.

Yang

Kang

Huang

, et al. (2024) Depth anything: unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

62.

(2025) Tpt-bench: a large-scale, long-term and robot-egocentric dataset for benchmarking target person tracking. URL. https://doi.org/10.5281/zenodo.17718188

63.

Shen

Lin

, et al. (2021) Deep learning for person re-identification: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(6): 2872–2893. https://doi.org/10.1109/tpami.2021.3054775

64.

Zhao

Pan

, et al. (2023) Robot person following under partial occlusion. In: Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, pp. 7591–7597.

65.

Zhao

Zhan

, et al. (2024) Person re-identification for robot person following with online continual learning. In: IEEE Robotics and Automation Letters.

66.

Yoshimi

Nishiyama

Sonoura

, et al. (2006) Development of a person following robot with vision based target detection. In: Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 5286–5291.

67.

Yang

, et al. (2019) Robust person re-identification by modelling feature uncertainty. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 552–561.

68.

Jiao

, et al. (2024) Gv-bench: benchmarking local feature matching for geometric verification of long-term loop closure detection. In: Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 7922–7928.

69.

Zhang

Liu

, et al. (2019) Vision-based target-following guider for mobile robot. IEEE Transactions on Industrial Electronics 66(12): 9360–9371. https://doi.org/10.1109/tie.2019.2893829

70.

Zhang

Sun

Jiang

, et al. (2022) Bytetrack: multi-object tracking by associating every detection box. In: Proceedings of the European Conference on Computer Vision.

71.

Zhao

Zhan

, et al. (2024) Human orientation estimation under partial observation. In: The IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 11544–11551.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB