Sage Journals: Discover world-class research

Abstract

In autonomous systems and robotics, acoustic signals provide valuable information for tasks such as acoustic source localization and recognition (LR), particularly in environments where visual sensing is limited. This paper investigates two unmanned aerial vehicles (UAVs)-based real-world scenarios that leverage acoustic scene awareness: (1) localization and recognition of human speech for search-and-rescue missions, and (2) detection and classification of other UAVs for counter-drone applications. To address these tasks, we design two deep learning models based on convolutional neural networks (CNNs) and a feature-based approach. These models process acoustic signals captured by two types of microphone arrays mounted on UAVs: a 4-microphone linear array and a 19-microphone spherical array. Each model performs direction of arrival (DOA) estimation and source classification under challenging ego-noise conditions using real-world datasets recorded in controlled experimental setups. We evaluate the models across different signal-to-ego-noise ratios and training configurations. Results show robust performance in both localization and recognition tasks, with approximately 6 degrees mean error and 7 degrees root mean square error (RMSE) for DOA estimations in the human speech scenario with multi-speaker classification accuracy till 0.95, and 3-5 degrees mean error and 7-11 degrees RMSE for DOA estimations in the UAV sound scenario with multi-UAV classification accuracy till 0.98. This demonstrates the potential of deep acoustic learning for UAV-based scene understanding in complex operational environments.

Keywords

Features-based deep learning voice localization and recognition UAV localization and recognition covariance matrix feature CNN-based DOA estimation and acoustic signal recognition

Get full access to this article

View all access options for this article.

References

Zhang

. A Method for UAV Reconnaissance and Surveillance in Complex Environments. In: 2020 6th International conference on control, automation and robotics (ICCAR), 2020, pp.482–485. DOI: 10.1109/ICCAR49639.2020.9107972.

Huang

Savkin

. Reactive 3D deployment of a flying robotic network for surveillance of mobile targets. Comput Network 2019; 161: 172–182.

Cecchinato

Scagnetto

Toma

, et al. A broadcast sub-GHz framework for unmanned aerial vehicles clock synchronization. Integr Comput Aided Eng 2024; 31: 59–75.

Nakadai

Kumon

Okuno

, et al. Development of microphone-array-embedded UAV for search and rescue task. In: 2017 IEEE/RSJ International conference on intelligent robots and systems (IROS), 2017, pp.5985–5990. DOI: 10.1109/IROS.2017.8206494.

Cecchinato

Toma

Scagnetto

, et al. An integrated monitoring system for aerial drones and underwater ROVs. In: 2023 IEEE International workshop on technologies for defense and security (TechDefense), 2023, pp.187–191. DOI: 10.1109/TechDefense59795.2023.10380896.

Toma

Cecchinato

Ferrin

, et al. MAV-link-based control and coordination of a multi-drone cluster for intelligence, surveillance and reconnaissance tasks. Unmanned Syst 2024; 13: 1027–1040.

Toma

Cecchinato

Drioli

, et al. Onboard audio and video processing for secure detection, localization, and tracking in counter-UAV applications. Procedia Comput Sci 2022; 205: 20–27.

Liu

Wei

Chen

, et al. Drone detection based on an audio-assisted camera array. In: 2017 IEEE Third international conference on multimedia big data (BigMM), 2017, pp.402–406. DOI: 10.1109/BigMM.2017.57.

Yamazaki

Premachandra

Perea

. Audio-processing-based human detection at disaster sites with unmanned aerial vehicle. IEEE Access 2020; 8: 101398.

10.

Zhang

Masahide

Lim

. Sound source localization and interaction based human searching robot under disaster environment. In: 2019 SICE International symposium on control systems (SICE ISCS), 2019, pp.16–20. DOI: 10.23919/SICEISCS.2019.8758766.

11.

Rosati

Fabiani

Pierdicca

, et al. An automated workflow based on UAV imagery and deep learning methods for monitoring excavation area work. Integr Comput Aided Eng 2025; 32: 272–291.

12.

Gásienica-Józkowy

Knapik

Cyganek

. An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance. Integr Comput Aided Eng 2021; 28: 221–235.

13.

Jiang

Frøseth

Rønnquist

, et al. A visual inspection and diagnosis system for bridge rivets based on a convolutional neural network. Comput Aided Civil Infrastr Eng 2024; 39: 3786–3804.

14.

Martins

Papa

Adeli

. Deep learning techniques for recommender systems based on collaborative filtering. Expert Syst 2020; 37: e12647.

15.

Hassanpour

Moradikia

Adeli

, et al. A novel end-to-end deep learning scheme for classifying multi-class motor imagery electroencephalography signals. Expert Syst 2019; 36: e12494.

16.

Kolamunna

Dahanayaka

, et al. dronePrint: Acoustic signatures for open-set Drone detection and identification with online data. Assoc Comput Mach 2021; 5: 1–31.

17.

Al-Emadi

Al-Ali

. Audio-based drone detection and identification using deep learning techniques with dataset enhancement through generative adversarial networks. Sensors 2021; 21: 1–26.

18.

Kolamunna

Dahanayaka

, et al. AcousticPrint: acoustic signature based open set drone identification. In: Proceedings of the 13th ACM conference on security and privacy in wireless and mobile networks. WiSec ’20, New York, NY, USA: Association for Computing Machinery. ISBN 9781450380065, pp.349–350. DOI: 10.1145/3395351.3401700.

19.

Choi

Chang

. Convolutional Neural Network-based Direction-of-Arrival Estimation using Stereo Microphones for Drone. In: 2020 International conference on electronics, information, and communication (ICEIC), 2020, pp.1–5. DOI: 10.1109/ICEIC49074.2020.9051364.

20.

Motlicek

Odobez

. Neural network adaptation and data augmentation for multi-speaker direction-of-arrival estimation. IEEE/ACM Trans Audio Speech Lang Process 2021; 29: 1303–1317.

21.

Salvati

Drioli

Foresti

. Exploiting CNNs for improving acoustic source localization in noisy and reverberant conditions. IEEE Trans Emerg Top Comput Intell 2018; 2: 103–116.

22.

Wang

Cavallaro

. Time-Frequency processing for sound source localization from a micro aerial vehicle. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017, pp.496–500. DOI: 10.1109/ICASSP.2017.7952205.

23.

Farhadi

Corrado

Ventura

. Automated acoustic event-based monitoring of prestressing tendons breakage in concrete bridges. Comput Aided Civil Infrastr Eng 2024; 39: 3700–3720.

24.

Varanasi

Gupta

Hegde

. A deep learning framework for robust DOA estimation using spherical harmonic decomposition. IEEE/ACM Trans Audio, Speech Lang Process 2020; 28: 1248–1259.

25.

Chakrabarty

Habets

EAP

. Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J Select Top Signal Process 2019; 13: 8–21.

26.

Wang

Zhang

Wang

. Robust speaker localization guided by deep learning-based time-frequency masking. IEEE/ACM Trans Audio, Speech Lang Process 2019; 27: 178–188.

27.

Cobos

Antonacci

Alexandridis

, et al. A survey of sound source localization methods in wireless acoustic sensor networks. Wirel2017ess Commun Mobile Comput 2017; 2017: 3956282.

28.

Lee

, et al. Sound source positioning using microphone array installed on a flying drone. J Acoust Soc Am 2016; 140: 3422–3422.

29.

Salvati

Drioli

Ferrin

, et al. Beamforming-based acoustic source localization and enhancement for multirotor UAVs. In: 2018 26th European signal processing conference (EUSIPCO), 2018, pp.987–991. DOI: 10.23919/EUSIPCO.2018.8553514.

30.

DCASE 2019 Task 3 Sound Event Localization and Detection. https://github.com/qiuqiangkong/dcase2019_task3.

31.

Toma

Salvati

Drioli

, et al. Efficient detection and localization of acoustic sources with a low complexity cnn network and the diagonal unloading beamforming. In: 2022 International joint conference on neural networks (IJCNN), 2022, pp.1–8. DOI: 10.1109/IJCNN55064.2022.9892709.

32.

Yamazaki

Tamaki

Premachandra

, et al. Victim detection using UAV with on-board voice recognition system. In: 2019 Third IEEE international conference on robotic computing (IRC), 2019, pp.555–559. DOI: 10.1109/IRC.2019.00114.

33.

Abdulghani

Walters

Abed

. Voice signature recognition for UAV pilots identity verification. In: 2023 International conference on computational science and computational intelligence (CSCI), 2023, pp.125–129. DOI: 10.1109/CSCI62032.2023.00026.

34.

Morito

Sugiyama

Kojima

, et al. Partially shared deep neural network in sound source separation and identification using a UAV-embedded microphone array. In: 2016 IEEE/RSJ International conference on intelligent robots and systems (IROS), 2016, pp.1299–1304. DOI: 10.1109/IROS.2016.7759215.

35.

Banerjee

Nilhani

Dhabal

, et al. Chapter 15 - A novel sound source localization method using a global-best guided cuckoo search algorithm for drone-based search and rescue operations. In: Koubaa A and Azar AT (eds.) Unmanned aerial systems. Advances in Nonlinear Dynamics and Chaos (ANDC), Academic Press. ISBN 978-0-12-820276-0, 2021. pp.375–415. DOI: 10.1016/B978-0-12-820276-0.00022-4.

36.

Strauss

Mordel

Miguet

, et al. DREGON: Dataset and Methods for UAV-embedded sound source localization. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), 2018, pp.1–8. DOI: 10.1109/IROS.2018.8593581.

37.

Deleforge

Di Carlo

Strauss

, et al. Audio-based search and rescue with a drone: Highlights from the IEEE signal processing cup 2019 student competition [SP competitions]. IEEE Signal Process Magaz 2019; 36: 138–144.

38.

Solis

Shashev

Shidlovskiy

. Implementation of audio recognition system for unmanned aerial vehicles. In: 2021 International siberian conference on control and communications (SIBCON), 2021, pp.1–8. DOI: 10.1109/SIBCON50419.2021.9438906.

39.

McCoy

Rawal

Rawat

, et al. Ensemble deep learning for sustainable multimodal UAV classification. IEEE Trans Intell Transpor Syst 2023; 24: 15425–15434.

40.

Lee

Han

Byeon

, et al. CNN-based UAV detection and classification using sensor fusion. IEEE Access 2023; 11: 68791–68808.

41.

Sedunov

Haddad

Salloum

, et al. Stevens drone detection acoustic system and experiments in acoustics UAV tracking. In: 2019 IEEE International symposium on technologies for homeland security (HST), 2023, pp.1–7. DOI: 10.1109/HST47167.2019.9032916.

42.

Yang

Liu

, et al. Intelligent unmanned defense system for autonomous interception of UAVs based on improved acoustic source localization algorithm. IEEE Access 2025; 13: 99697–99716.

43.

Chang

Yang

Shi

, et al. Feature extracted DOA estimation algorithm using acoustic array for drone surveillance. In: 2018 IEEE 87th Vehicular technology conference (VTC Spring), 2018, pp.1–5. DOI: 10.1109/VTCSpring.2018.8417601.

44.

Chevtchenko

Rodríguez

Vale

, et al. Drone-based sound source localization: A systematic literature review. IEEE Access 2025; 13: 94256–94274.

45.

Martinez-Carranza

Rascon

. A review on auditory perception for unmanned aerial vehicles. Sensors 2020; 20: 1–24.

46.

Salvati

Drioli

Foresti

. Iterative diagonal unloading beamforming for multiple acoustic sources localization using compact sensor arrays. IEEE Sens J 2021; 21: 15080–15089.

47.

Insausti

Hogstad

Pätzold

. Modelling and simulation of ego-noise of unmanned aerial vehicles. In: 2020 IEEE 91st Vehicular technology conference (VTC2020-Spring), 2020, pp.1–5. DOI: 10.1109/VTC2020-Spring48590.2020.9128572.

48.

Moshkov

. Study of the propellers noise of light aircraft under static conditions. In: 2022 International conference on dynamics and vibroacoustics of machines (DVM), 2022, pp.1–6. DOI: 10.1109/DVM55487.2022.9930902.

49.

Podsédkowski

Konopiński

Lipian

. Sound noise properties of variable pitch propeller for small UAV. In: 2022 International conference on unmanned aircraft systems (ICUAS), 2022, pp.1025–1029. DOI: 10.1109/ICUAS54217.2022.9836100.

50.

Kingan

McKay

, et al. Unmanned aerial vehicle propeller noise. In: Doolan C, Moreau D and Wills A (eds.) Flinovia—Flow induced noise and vibration issues and aspects—IV, 2025, pp.103–118. Cham: Springer Nature Switzerland. ISBN 978-3-031-73935-4.

51.

Toma

Salvati

Drioli

, et al. CNN-Based processing of acoustic and radio frequency signals for speaker localization from MAVs. In: Proc. Interspeech, 2021, pp.2147–2151. DOI: 10.21437/Interspeech.2021-886.

52.

Toma

Cecchinato

Drioli

, et al. CNN-based processing of radio frequency signals for augmenting acoustic source localization and enhancement in UAV security applications. In: 2021 International conference on military communication and information systems (ICMCIS), 2021, pp.1–5. DOI: 10.1109/ICMCIS52405.2021.9486424.

53.

Parrot Bebop Drone - User guide. https://www.parrot.com/assets/s3fs-public/2021-09/bebop-drone_user-guide_uk_v.3.4.pdf (accessed: 20 June 2025).

54.

DJI Matrice 200 - User Manual. https://dl.djicdn.com/downloads/M200/20201120/M200_User_Manual_EN_20201120.pdf (accessed: 20 June 2025).

55.

DJI Flame Wheel ARF KIT. https://www-v1.dji.com/flame-wheel-arf.html (accessed: 20 June 2025).

56.

PlayStation Eye - PS3 Eye. https://en.wikipedia.org/wiki/PlayStation_Eye (accessed: 20 June 2025).

57.

ZYLIA ZM-1 3rd order ambisonics microphone array. https://www.zylia.co/zylia-zm-1-microphone.html (accessed: 20 June 2025).

58.

Toma

. Acoustic UAV Datasets for Deep Acoustic Learning on Unmanned Aerial Vehicles for Real-Time Human and Drone Detection. https://lambda-iot.uniud.it/UAV_DeepAcousticLocalizationRecognition_Datasets, 2025.

59.

Speaker Identification AI. https://github.com/ManuOtel/Speaker-Identification-AI/tree/main (2023, accessed: 14 October 2025).

Deep acoustic learning on unmanned aerial vehicles for real-time human and drone detection

Abstract

Keywords

Get full access to this article

References