Sage Journals: Discover world-class research

Abstract

Mobile robots often follow humans in warehouse environments or indoor office spaces. A mobile robot requires a control system that stably follows humans without colliding with them. A human-following robot is composed of target detection, target tracking, and control. Conventional 2D laser-based human following systems exploit information from 2D planar laser data to detect human legs through machine learning methods, such as support vector data description and random forest. However, in crowded or cluttered environments, 2D LiDAR data is limited by the lack of features that distinguish human legs from obstacles. Due to the lack of features, false-positive detection is a problem in crowded or cluttered environments. Recent studies using 3D LiDAR have used sensors mounted at an elevated height to measure the overall shape of a person to extract features. Typical mobile robots are mounted in the bottom due to the vibration of the sensors, so there is a cost problem that additional sensors are required. We propose a framework for human following using 3D LiDAR. Our method is able to detect human legs using a LiDAR sensor attached at a low position without additional sensors. This study proposes a 3D laser-based human leg detection and tracking framework to improve the robustness of human-following for autonomous mobile robots. With a deep learning-based human leg detector, using the Point-Voxel-RCNN model, the proposed 3D human leg tracking system can help robots robustly follow humans in cluttered and crowded environments. Additionally, we demonstrate the robustness of our method in a practical cluttered environment by comparing the performance of a conventional human leg detection and following system.

Keywords

Human-following mobile robot 3D LiDAR point cloud

Introduction

Autonomous mobile robots can be used in a diverse field and, thus, have endless applications. Many industries leverage human–robot interactions and robots significantly contribute to increasing work productivity or supplementing human capabilities.^1–4 Many sensors and techniques can help robots recognize objects and people around them. The most commonly used sensors are vision and laser sensors, for example, cameras and light detection and ranging (LiDAR). Recently, Miao and Liu⁵ and Gelfert⁶ have proposed to detect people, using only cameras. However, vision sensors are sensitive to many environmental conditions, for example, changes in light conditions. Light conditions and motion blur result in a poor image quality in low light or dynamic environments. Laser sensors such as LiDAR are less accessible than vision sensors, however they can measure distances accurately and unlike cameras, they are robust and can be used in a wide range of environmental conditions.

Many studies have explored how human-following robots can detect and track people using various sensors.^7–9 Yuan et al.⁷ constructed a tracking system that integrates information from vision sensors and laser sensors based on the particle filter. This system enables a mobile robot to follow a target human. Wu et al.⁸ designed a module that enables a mobile robot to successfully follow specific objects, such as other robots or humans, using an ultrahigh-frequency radio frequency identification (RFID) device. However, RFID devices have the inconvenience of requiring special labels to be attached to the objects intended to be followed. Many studies have been conducted to detect human legs by assuming certain shapes to recognize humans. However, human legs that are observed from 2D laser sensors are hard to classify because the shape of 2D human legs is similar to that of obstacles such as chairs. Xavier et al.¹⁰ assumed the shape of human legs to be an arc of a circle and calculated the geometric features, such as the radius and the inscribed angle of the circle for detection. Studies^11–15 analyzed not only the geometric shape of the legs but also other information within the leg data to classify human legs. These studies comprised two main steps to classify human legs. Chung et al.¹¹ classified human legs by calculating the width, depth, and girth of the leg data and which was used to learn the data distribution by adopting the support vector data description.¹⁶ Li et al.¹² defined the spatial relationship of geometric information and learned different types of features using AdaBoost¹⁷ to improve the system performance by reducing false positive detection rate for unavoidable situations such as occlusion. Leigh et al.¹³ used geometric information in laser clusters to learn classifiers for human leg and non-human leg clusters using random forest (RF). Wang et al.¹⁴ proposed an adaptive-switch decision tree design to improve the detection performance by focusing on false-positive detection issues that surfaced in various noise conditions using the standard RF. The authors achieved this by differentiating between noise-sensitive features and non-noise-sensitive features. Despite many studies using 2D LiDAR, 2D LiDAR data has limitations when it comes to detecting human legs. 2D LiDAR data lacks features that distinguish human legs from obstacles. In human following systems, the insufficient performance of the detector leads to unreliable tracking.

Deep learning-based object detection methods using 3D laser data can be widely classified into two categories: single-stage detectors and two-stage detectors. Single-stage detectors calculate the class and location of an object simultaneously and consume low-enough computational time to ensure real-time operation.^18,19 Unlike single-stage detectors, two-stage detectors^20,21 required two steps to detect objects. First, they generate regions of interest (RoIs) through a region proposal network to indicate locations with a high probability of object presence from the input data. In the second step, class prediction and box regression are performed using the generated RoIs.

Object detection methods using 3D laser point cloud data can be further classified into point-based and grid-based methods. Conventional deep learning frameworks are well-suited for ordered input data such as 2D images or 3D Voxels. Qi et al.²² proposed a novel deep learning framework for object classification and segmentation using unordered 3D raw point cloud data as input. While point-based methods encode spatial information from 3D raw point cloud data and detect objects using the PointNet framework,^22,23 grid-based methods exploit data preprocessing methods to convert 3D raw point cloud data into ordered data such as 3D voxels or a 2D bird's-eye view (BEV) feature map for conventional deep learning frameworks. Shi et al.²¹ proposed a framework, the Point-Voxel RCNN (PV-RCNN), that integrated grid and point-based methods and showed state-of-the-art detection performance with high detection accuracy in autonomous driving environments. PV-RCNN model²¹ has been used to detect the shape of 3D human legs from 3D raw laser data with the aim of developing a system to enable a human-following robot to robustly track people. However, due to the heavy computational burden of the PV-RCNN model, it is challenging to use it for real-time detection.

With the increase in access to 3D laser sensors, many studies are targeting object detection using 3D laser data.^24–26 Chen et al.²⁶ use a support vector machine to detect humans from 3D laser sensor data. However, in typical mobile robots, the sensors are placed at a low position due to vibration problems. Since the method described by Chen et al.²⁶ requires point data from both the body and the legs, additional sensors are required to be installed at a higher level. The additional sensors are a cost problem. Person-MinkUNet, a 3D person detection network based on Minkowski Engine and U-Net architecture, is proposed by Jia and Leibe,²⁷ applying submanifold sparse convolution. Yan et al.²⁸ introduced an online learning system for human classification by mobile robots using 3D LiDAR sensors, requiring minimal labeled data and leveraging real-time clustering and automated sample generation. The RPEA network²⁹ is proposed, featuring a Residual Path architecture to retain spatial information and an efficient Channel Attention module to suppress noise in 3D point clouds, to improve detection accuracy. However, recent studies^28,29 have focused on the problem of identifying people using their overall body shape. In the case of a typical mobile robot equipped with a LiDAR sensor, it is generally mounted at a height between 0.35 and 0.45 m from the ground. For interacting between people and mobile robots, they need to be within 2 m of each other. When the LiDAR measures within 2 m range, it typically only detects the person's legs. In this study, we propose a 3D human leg tracking system using the adaptive search space PV-RCNN model. The real-time performance of the PV-RCNN model was improved by reducing its computational time using a target pose-based search space adjustment method. We also adopted open-loop control to enable the mobile robot to follow the target in real time. Our main contributions are summarized as follows.

We utilized the PV-RCNN model to train the 3D human leg shape and improved the human leg detection performance of the human leg tracking system for mobile robots.

To reduce the computational cost of the PV-RCNN deep learning model and improve its real-time detection capability, we implemented a target pose-based search space, which result in reduced computational cost.

3D human leg tracking system using the adaptive search space PV-RCNN model

This study proposes a human leg detection and tracking system using PV-RCNN. Figure 1 shows a flowchart of the human tracking system proposed by this study, which can be classified into three major processes: (a) the detector, (b) the tracker, and (c) the motion controller of the mobile robot. In the detector, the collected human leg data is used to detect human legs from the input data through a pre-trained PV-RCNN-based deep learning model. Subsequently, the data detected in the detector is transferred to the tracker, where it undergoes a series of tracking processes and ultimately updated the tracking data. After identifying the target to be tracked, the velocity of the mobile robot is calculated using the positional information to control the robot and ensure it safely follows the target.

Figure 1.

3D human leg tracking system framework.

3D human leg detector

The PV-RCNN model has the limitations of slow inference time for real-time object detection. To address this, we implemented a preprocessing step before the raw point cloud entered PV-RCNN. This preprocessing step reduced the computational burden and ensuring robustness, thereby enabling real-time object detection.

Figure 2 shows a flowchart of the preprocessing of 3D raw point cloud data to reduce the inference time of the PV-RCNN model when detecting human legs. The initial search space is defined by setting a boundary in front of the robot. When a target human was detected, the boundary is reset around the location of this target. If the detection was failed, the search space expands back to the initially boundary to re-identify the target human. By configuring the adaptive search space, we reduced the amount of raw point cloud data that required computation, thereby improving real-time detection using the PV-RCNN model. Furthermore, setting the search space around the detected target enabled robust tracking.

Figure 2.

3D point cloud preprocessing flow chart.

Figure 3 illustrates an example of the search space. The left image represents the search space when no human is present, and the right image demonstrates the reduction of the search space when a target human was detected. Since the robot follows the person in front of it, the initial search space is defined as a square relative to the robot's forward direction. When a person is detected, the search space is adjusted to a circular shape based on the center position coordinates of the person calculated by the network. The circular shaped search space is smaller than the square shaped search space, resulting in lower computational costs. PV-RCNN accepted 3D laser data as input, which is then voxelized. Using a 3D sparse convolution layer, the downsampled voxels were transformed into a 2D BEV feature map. Thereafter, a 3D RoI was determined, which was then refined using keypoints, ultimately outputted a bounding box.

Figure 3.

Example of search space. (a) Front view search space. (b) Target pose-based search space.

Figure 4 depicts a simplified illustration of the process by which the PV-RCNN model detects human legs from 3D laser data. The mobile robot can recognize humans using the designed detector, which collected preprocessed data $P_{k}^{f i l t e r e d}$ from the 3D laser point cloud at the current timestep k, feeds it into the pre-trained PV-RCNN model, and obtains the 3D bounding box $B B o x_{k}$ of the object from the model as per equation (1).

B B o x_{k} = {x, y, z, h, w, l, θ, c l a s s}

(1)

Figure 4.

PV-RCNN framework.¹⁹

The $B B o x_{k}$ comprises the 3D center coordinates x, y, z of the 3D bounding box, along with its height h, width w, length l, direction θ, and class of the object. Subsequently, if human legs are detected from the obtained $B B o x_{k}$ , with a confidence level above a certain threshold, the 2D coordinates x, y, and the confidence value of the human legs are stored in the leg set $Z_{k} = {B B o x_{k}^{1}, \dots, B B o x_{k}^{N_{k}}}$ , where $N_{k}$ is the total number of detected human legs at timestep k, and passed on to the tracker. The PV-RCNN model, as a two-stage detector, proposed region proposals through predefined 3D anchors. Although the model required more time with an increase in the number of classes it detects, the PV-RCNN model used in the human-following robot was more efficient because it detected only human legs as a single class.

Tracking of multiple human legs

The proposed 3D laser-based human leg tracking system was designed using a simple and basic tracker. While the performance of the tracker is important for the overall performance of the human tracking system, updated tracked data of the detector's performance determines the robustness of the entire system. Therefore, by using the highly accurate PV-RCNN model, we were able to design a robust human tracking system using only a basic tracker. The tracker combines a Kalman filter for predicting the future state of the tracking data and a Global Nearest Neighbor (GNN) data association method for solving the data association problem between the detected data and tracked data.

The GNN method is used to perform the data association process between detected data and tracked data in a 2D human leg tracking system.¹³ At the current timestep t, the Mahalanobis distance $d_{m a h a l a n o b i s}$ , which takes into account the sensor's covariance, is calculated between the $N_{t}$ detected human leg data $Z_{t} = {Z_{t}^{1}, Z_{t}^{2}, \dots, Z_{t}^{N_{t}}}$ and the $N_{i}$ is updated tracked data $T_{t | t - 1} = {T_{t | t - 1}^{1}, T_{t | t - 1}^{2}, \dots, T_{t | t - 1}^{N_{i}},}$ from the t from the previous timestep t−1. Using the Mahalanobis distance $d_{m a h a l a n o b i s}$ , the matching pair that minimizes the assignment cost is calculated through the Munkres assignment algorithm¹³ for the detection data $Z_{t}$ and the tracking data $T_{t | t - 1}$ within a certain threshold, thereby allowing us to obtain the tracking data $T_{t}$ at timestep t.

For a mobile robot to robustly follow a human, it is essential to have management tasks that create and remove tracking data using detection data. Approaches such as tracking all detection data as humans or continuously tracking and maintaining tracking data that has not been detected for a certain period can compromise the robustness of the system, as we cannot always be confident that the output data from the detector is reliable. Therefore, to ensure the robustness of the tracking system, if detection occurs n times or more within a certain range at the current timestep t, it is judged to be a human and the tracking data $T_{t}$ is created and initialized. Subsequently, the tracking system is designed to remove the tracking data that has failed to match m times or more in the data association process.

To match the tracking data at timestep t with the detection data at timestep t + 1, the tracking data at timestep t is predicted and updated using the Kalman filter.¹³ The tracking data $T_{t | t - 1}^{N_{i}} = [x, y, \dot{x}, \dot{y}]$ within the current timestep t, which includes $N_{i}$ data points, stores the 2D position coordinates x and y and the velocity values $\dot{x}$ and $\dot{y}$ . The initial velocities of the initialized tracking data are set to 0, and the updated tracking data, successfully matched with the detection data, is propagated according to the constant velocity model. Subsequently, the updated tracking data $T_{t + 1 | t}$ at timestep t + 1 is used to attempt matching with the detection data $Z_{t + 1}$ through the data association process.

Motion control of a mobile robot for target-following

To safely and robustly follow a target, a mobile robot must define a safe distance $d_{s a f e}$ and direction θ between the robot and the human. Considering the average walking velocity of a human and the acceleration and deceleration performance of a robot, a safe distance that does not cause discomfort to the human must be maintained. Additionally, to prevent tracking failures when obstacles or people move out of the sensor field of view, the heading of the mobile robot should also match with the moving direction of the human. Therefore, we defined a safe distance d and direction θ of the robot and calculated the posture error $E = [d_{e r r o r}, θ_{e r r o r}]$ , which changed in accordance with the movement of the human in the coordinate system $X_{R}, Y_{R}$ of the mobile robot. Figure 5 illustrates the posture error E that occurred when the mobile robot was following the target human. Using an open-loop controller, the translational velocity v and rotational velocity ω of the mobile robot were controlled to follow the target human. When the target existed in the tracked data $T_{t}$ , the posture error $E = [d_{e r r o r}, θ_{e r r o r}]$ was calculated using the 2D location coordinates $T_{t, x}$ , $T_{t, y}$ of the target human. Equations (2) and (3) were used to calculate the safe distance error $d_{e r r o r}$ and the heading error $θ_{e r r o r}$ , respectively.

d_{e r r o r} = \sqrt{T_{t, x}^{2} + T_{t, y}^{2}} - d_{s a f e}

(2)

θ_{e r r o r} = t a n^{- 1} (\frac{T_{t, y}}{T_{t, x}})

(3)

Figure 5.

Posture error between the mobile robot and the target human from the target posture.

The velocity V of the mobile robot = [v ω] was calculated to converge the posture error $E = [d_{e r r o r}, θ_{e r r o r}]$ to 0, calculated through equations (2) and (3).

Experiments and results

In this experiment, a deep learning-based 3D human leg tracking system was proposed to address the issue of performance degradation in 2D data-based human leg tracking systems due to increased false-positive detections in cluttered environments. To evaluate the performance of the human leg tracking system, we validated human tracking experiments in both clear and cluttered environments. We compared the leg detection performance of our proposed method with that of Leigh et al. method.¹³ The method of Leigh et al.¹³ is based on the method of the leg detector package of Robot Operating System (ROS)³⁰ and trains the target features to the RF classifier to detect the target, and Leigh et al. method¹³ shows a performance of reducing the false positive rate compared to the leg detector package of ROS.³⁰ We adopted the Velodyne VLP-16 3D LiDAR device to obtain 3D point cloud data, which was processed on a laptop to conduct human leg tracking and robot following experiments. Figure 6 shows the mobile robot equipped with a 3D LiDAR and laptop. The specifications of the laptop were CPU Intel Core i7-7700HQ, GPU Nvidia GeForce GTX 1070.

Figure 6.

Mobile robot with 3D LiDAR.

Evaluation metrics of human leg detection

Precision and recall, widely used as evaluation metrics for detection performance, were computed to evaluate the performance of the detector. Precision means how accurate is the detection results of the detector, and recall indicates how well the detector can detect actual objects without omitting any. Equations (4) and (5) were used to calculate precision and recall, respectively, using the values of true-positive (TP), false-positive (FP), and false-negative (FN).

Precision = \frac{T P}{T P + F P}

(4)

Recall = \frac{T P}{T P + F N}

(5)

Experimental results of leg tracking in a clear environment

Figure 7 shows the scenario where the robot followed the target human in a clear environment. The image on the left represents the actual robot and human. The image in the middle shows the detection of human legs using Leigh et al. method,¹³ and the one on the right illustrates the proposed system that detected human legs from raw point cloud data.

Figure 7.

Results of human leg detection with different methods in a clear environment. (a) Real-world environment. (b) Tracking results.¹³ (c) Our tracking results.

Figure 8 shows the precision–recall curve that varies with the detection threshold of the two system detectors. In this figure, the solid line (labeled “Proposed 3D system”) represents our proposed PV-RCNN-based 3D human leg tracking method, and the dotted line (labeled “Leigh et al.¹³”) represents the 2D laser-based human leg tracking method adopted by Leigh et al.¹³ When evaluating the performance of a detection system, both precision and recall should be considered simultaneously. Therefore, it was necessary to analyze both metrics.

Figure 8.

Precision–recall curves of different methods in a clear environment.

It can be seen from Table 1 that, both systems achieved a precision of over 98% and a recall of close to 98% when detecting humans. The performance of our proposed 3D system in this paper is that even if the target tracking system fails to detect, it is able to detect again in the next loop and does not fail to follow the target. Thus, it can be concluded that both systems demonstrated satisfactory performance when following target human in clear environments and were reliable.

Table 1.

Comparison of best detection performance in a clear environment.

Method	TP	FP	FN	Precision	Recall
Leigh et al.¹³	841	0	7	100%	99.17%
Proposed 3D system	616	11	14	98.25%	97.78%

Experimental results of leg tracking in a cluttered environment

Figure 9 shows the results of the performance evaluation of the tracking system's detection capabilities. An experiment was conducted where the mobile robot navigated a complex office environment, where obstacles such as chairs and table legs, to resemble human legs, were present in relative abundance. The image on the left shows the actual robot and human. The image in the middle shows the detection of human legs by Leigh et al.,¹³ and the one on the right illustrates the proposed system that detected human legs from 3D raw point cloud data.

Figure 9.

Results of human leg detection with different methods in a cluttered environment. (a) Real-world environment. (b) Tracking results with false-positive detection from Leigh et al.¹³ (c) Tracking results of the proposed system.

Figure 10 shows the precision–recall curves of different methods in a cluttered environment. The average precision (AP) value is used when quantitatively comparing the performance of detection system. The area under the precision–recall curve represents the AP value, with a larger area indicating better system performance.

Figure 10.

Precision–recall curves of different methods in a cluttered environment.

As seen in Table 2, the 2D laser-based system¹³ exhibited an increase in the false-positive detection rate, thus resulting in a decrease in the precision of the detector to 43.95%; in contrast, the proposed system showed a recall of 77.61% and precision of 81.99%, indicating superior performance compared to the 2D laser-based system.

Table 2.

Comparison of best detection performance in cluttered environment.

Method	TP	FP	FN	Precision	Recall	AP
Leigh et al.¹³	472	602	94	43.95%	83.39%	0.3968
Proposed 3D system	305	67	88	81.99%	77.61%	0.7362

In a clear environment, both Leigh et al. method¹³ and the proposed 3D system performed well in target tracking. However, in a cluttered environment, the proposed 3D system outperformed the Leigh et al. method.¹³ This means that our proposed 3D system is more useful in real office environments.

Table 3 shows a comparison of inference time using the preprocessed data based on adaptive search space. When detecting human legs, PV-RCNN took 0.222 s; in contrast, our proposed system that used the target pose-based search space took 0.205 s. The comparison shows that our proposed system reduced the inference time by approximately 8% compared to the conventional PV-RCNN.

Table 3.

Comparison of inference time.

	PV-RCNN with raw point cloud	PV-RCNN with filtered point cloud
Mean inference time (s)	0.222	0.205

Conclusion

In this study, we propose a 3D human leg tracking system that uses PV-RCNN-based 3D object detection method to detect 3D human legs and enhance the detection performance of the human leg detector in cluttered environments. The human leg tracking system enables a more robust human-following capability for mobile robots. A significant issue with 2D laser sensor-based human tracking systems is that, in complex environments where many objects resemble closely leg shapes, false detections probability will be increased and leading to tracking performance degradation. To address this issue, we replaced the 2D laser sensor-based detector with a 3D laser sensor-based detector, which utilizes the PV-RCNN model, a deep learning-based 3D object detection framework, to learn the shape of human legs and detect them from 3D laser data. Through experiments conducted in cluttered environments such as offices with many obstacles that resemble leg shapes, we confirmed a reduction in the false-positive detection rate of the human leg detector and an improvement in the performance of the tracking system. The proposed human tracking system in this paper will help to solve the problems such as mobile robots losing the target human and to improve the overall human-following performance of mobile robots. In the future, we will work on a network that utilizes not only 3D LiDAR but also RGBD camera sensors to fuse point cloud data and image data for more robust performance in target tracking systems.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Chang-bae Moon

Data Availability statements

References

Zhang

. Mobile robot for power substation inspection: a survey. IEEE/CAA J Autom Sinica 2017; 4: 830–847.

Triebel

Arras

Alami

, et al. SPENCER: a socially aware service robot for passenger guidance and help in busy airports. In: Wettergreen

Barfoot

(eds) Field and service robotics. Cham: Springer International Publishing, 2016, pp.607–622.

Zhang

Peng

Feng

, et al. Human-AGV interaction: real-time gesture detection using deep learning. In: Yu

Liu

(eds) Intelligent robotics and applications. Cham: Springer International Publishing, 2019, pp.231–242.

Cherubini

Passama

Navarro

, et al. A collaborative robot for the factory of the future: BAZAR. Int J Adv Manuf Technol 2019; 105: 3643–3659.

Miao

Liu

. Application of human motion recognition technology in extreme learning machine. Int J Adv Robot Syst 2021; 18: 172988142098321.

Gelfert

. A sensor review for human detection with robotic systems in regular and smoky environments. Int J Adv Robot Syst 2023; 20: 172988062311752.

Yuan

Chen

Sun

, et al. Multisensor information fusion for people tracking with a Mobile robot: a particle filtering approach. IEEE Trans Instrum Meas 2015; 64: 2427–2442.

Tao

, et al. A UHF RFID-based dynamic object following method for a Mobile robot using phase difference information. IEEE Trans Instrum Meas 2021; 70: 1–11.

Cui

Zha

Zhao

, et al. Laser-based detection and tracking of multiple people in crowds. Comput Vis Image Understand 2007; 106: 300–312.

10.

Xavier

Pacheco

Castro

, et al. Fast line, arc/circle and leg detection from Laser scan data in a player driver. In: Proceedings of the 2005 IEEE international conference on robotics and automation. Barcelona, Spain: IEEE, 2005, pp.3930–3935.

11.

Chung

Kim

Yoo

, et al. The detection and following of human legs through inductive approaches for a Mobile robot with a single Laser range finder. IEEE Trans Ind Electron 2012; 59: 3156–3166.

12.

, et al. A multi-type features method for leg detection in 2-D Laser range data. IEEE Sensors J 2018; 18: 1675–1684.

13.

Leigh

Pineau

Olmedo

, et al. Person tracking and following with 2D laser scanners. In: 2015 IEEE international conference on robotics and automation (ICRA). Seattle, WA, USA: IEEE, 2015, pp.726–733.

14.

Wang

Xue

Zhou

, et al. MARF: multiscale adaptive-switch random forest for leg detection with 2-D Laser scanners. IEEE Trans Cybern 2023; 53: 6200–6210.

15.

Cha

Chung

. Human-Leg detection in 3D feature space for a person-following Mobile robot using 2D LiDARs. Int J Precis Eng Manuf 2020; 21: 1299–1307.

16.

Tax

DMJ

Duin

RPW

. Support vector data description. Machine Learning 2004; 54: 45–66.

17.

Schapire

. A brief introduction to boosting. In: Proceedings of the IJCAI International Joint Conference on Artificial Intelligence; 1999 Jul 31–Aug 6; Stockholm, Sweden. Vol. 2. p. 1401–1406.

18.

Yan

Mao

. SECOND: sparsely embedded convolutional detection. Sensors 2018; 18: 3337.

19.

Lang

Vora

Caesar

, et al. Pointpillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA: IEEE, 2019, pp.12689–12697.

20.

Shi

Wang

PointRCNN: 3D object proposal generation and detection from point cloud. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). Long Beach, CA, USA: IEEE, 2019, pp.770–779.

21.

Shi

Guo

Jiang

, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA: IEEE, 2020, pp.10526–10535.

22.

Charles

Kaichun

, et al. Pointnet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, HI: IEEE, 2017, pp.77–85.

23.

, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space. Proceedings of the 31st International Conference on Neural Information Processing Systems 2017: 5105–5114.

24.

Zhang

Wang

. Directional PointNet: 3D Environmental Classification for Wearable Robotics. Epub ahead of print 2019. DOI: https://doi.org/10.48550/ARXIV.1903.06846

25.

Jertec

Bojanic

Bartol

, et al. On using PointNet architecture for human body segmentation. In: 2019 11th international symposium on image and signal processing and analysis (ISPA). Dubrovnik, Croatia: IEEE, 2019, pp.253–257.

26.

Chen

Liu

, et al. LoPF: an online LiDAR-only person-following framework. IEEE Trans Instrum Meas 2022; 71: 1–13.

27.

Jia

Leibe

. Person-MinkUNet: 3D Person Detection with LiDAR Point Cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2021.

28.

Yan

Duckett

Bellotto

. Online learning for 3D LiDAR-based human detection: experimental analysis of point cloud clustering and classification methods. Auton Robot 2020; 44: 147–164.

29.

Guang

, et al. RPEA: a residual path network with efficient attention for 3D pedestrian detection from LiDAR point clouds. Expert Syst Appl 2024; 249: 123497.

30.

Smart

. Towards more efficient navigation for robots and humans. In: 2013 IEEE/RSJ international conference on intelligent robots and systems. Tokyo: Japan: IEEE, 2013, pp.1707–1713.

Development of a human-following scheme using point-voxel RCNN-based 3D human leg detection for the robust human-following of mobile robots in cluttered environments

Abstract

Keywords

Introduction

3D human leg tracking system using the adaptive search space PV-RCNN model

3D human leg detector

Tracking of multiple human legs

Motion control of a mobile robot for target-following

Experiments and results

Evaluation metrics of human leg detection

Experimental results of leg tracking in a clear environment

Experimental results of leg tracking in a cluttered environment

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References