Abstract
This study is based on the BYOL (Bootstrap Your Own Latent), a self-supervised learning model, combined with the powerful data generation capability of the Diffusion Model, to design an enhanced model for the recognition of rare bird species. The original BYOL model is an architecture consisting of dual neural networks, the online network and the target network. In contrast to the original architecture, we propose the BYOL-3T/4T framework, which incorporates multiple target networks to enhance the capability of feature learning of the online network. In the aspect of preparation of training data, given that rare bird species are indeed “rare”, meaning that obtaining real image samples of them is challenging. Under such circumstances, we propose the use of a specially fine-tuned Diffusion Model to generate a large quantities of augmented image data for rare bird species from a small number of natural images. This augmented data is then used in conjunction with the self-supervised learning framework for training. The advantages of self-supervised learning are that in the first stage, it entirely relies on unlabeled data, and in the second stage, only a small amount of labeled data is needed for fine-tuning the model. This approach results in significant cost savings in terms of the manual effort required for data labeling. Experimental results demonstrate that the proposed model achieves a high F1-score when recognizing natural images of rare bird species. This indicates that the generated augmented data for training is of high quality, and the proposed model exhibits excellent generalization. The collaboration of these two approaches shows highly practical in real-world applications.
Get full access to this article
View all access options for this article.
