Abstract
To address the challenges of feature extraction difficulties and complex background interference in unmanned aerial vehicle (UAV) remote sensing vehicle target detection, YOLOv11 is chosen as the baseline model due to its efficient detection speed, excellent detection accuracy, and the ability to precisely identify objects of small target. Firstly, we introduce the FasterNet neural network, which maintains high detection accuracy while achieving a breakthrough in computational speed and significantly reduces the number of parameters and computational complexity. Secondly, based on the FasterNet model, we propose the Efficient Multi-scale Attention (EMA) mechanism. Through multi-scale feature fusion and efficient computational design, the model enhances its understanding of complex information. Finally, we integrate the DySample and Lightweight Adaptive Extraction (LAE) modules into the neck network. These two modules support dynamic feature alignment and adaptive channel optimization, optimizing the allocation of computational resources. Basing on improving modules, we propose the YOLOv11-FEDL model. Its optimized feature extraction mechanism forms an efficient, flexible, and robust object detection framework. The resulting architecture enhances detection performance while simultaneously lowering computational complexity. In the lightweight experiments conducted on the Cardrone and VisDrone2019 datasets, the mAP@0.5 metric reached 89.7% and 39.1% respectively, the inference speed increases to 301.80 FPS and 334.10 FPS respectively, and the average inference time is reduced to 2.41 ms and 2.43 ms respectively. The improved model maintains comparable detection accuracy while reducing the parameter count to 2.33 M and the computational complexity to 5.3 GFLOPs, and the inference speed is increased by approximately 20%.
Get full access to this article
View all access options for this article.
