Abstract
To address challenges such as insufficient real-time performance, balancing detection accuracy with model size, and interference from complex lighting and occlusion in orchard environments, this study proposes a lightweight and efficient apple detection model, YOLO-EAD (efficient apple detection). The model enhances YOLOv8 by replacing its backbone with the EfficientViT-T network to reduce computational complexity, introducing a self-attention-based detection head (SA-detect) to streamline detection branches, and integrating a coordinate attention (CA) mechanism in the Neck layer to improve feature focus. Additionally, the SIoU loss function is adopted for more precise bounding box regression. These enhancements collectively reduce model size to 4.1 MB and computational complexity to 9.3 GFlops while achieving a high mAP@0.5 of 96.7%. Compared to the original YOLOv8s, this model reduces complexity by 67.5% with improved detection accuracy and robustness under varied lighting and occlusion conditions, making it suitable for real-time applications on edge devices in agricultural robotics.
Get full access to this article
View all access options for this article.
