Abstract
In autonomous systems and robotics, acoustic signals provide valuable information for tasks such as acoustic source localization and recognition (LR), particularly in environments where visual sensing is limited. This paper investigates two unmanned aerial vehicles (UAVs)-based real-world scenarios that leverage acoustic scene awareness: (1) localization and recognition of human speech for search-and-rescue missions, and (2) detection and classification of other UAVs for counter-drone applications. To address these tasks, we design two deep learning models based on convolutional neural networks (CNNs) and a feature-based approach. These models process acoustic signals captured by two types of microphone arrays mounted on UAVs: a 4-microphone linear array and a 19-microphone spherical array. Each model performs direction of arrival (DOA) estimation and source classification under challenging ego-noise conditions using real-world datasets recorded in controlled experimental setups. We evaluate the models across different signal-to-ego-noise ratios and training configurations. Results show robust performance in both localization and recognition tasks, with approximately 6 degrees mean error and 7 degrees root mean square error (RMSE) for DOA estimations in the human speech scenario with multi-speaker classification accuracy till 0.95, and 3-5 degrees mean error and 7-11 degrees RMSE for DOA estimations in the UAV sound scenario with multi-UAV classification accuracy till 0.98. This demonstrates the potential of deep acoustic learning for UAV-based scene understanding in complex operational environments.
Keywords
Get full access to this article
View all access options for this article.
