Sage Journals: Discover world-class research

Abstract

This study is based on the BYOL (Bootstrap Your Own Latent), a self-supervised learning model, combined with the powerful data generation capability of the Diffusion Model, to design an enhanced model for the recognition of rare bird species. The original BYOL model is an architecture consisting of dual neural networks, the online network and the target network. In contrast to the original architecture, we propose the BYOL-3T/4T framework, which incorporates multiple target networks to enhance the capability of feature learning of the online network. In the aspect of preparation of training data, given that rare bird species are indeed “rare”, meaning that obtaining real image samples of them is challenging. Under such circumstances, we propose the use of a specially fine-tuned Diffusion Model to generate a large quantities of augmented image data for rare bird species from a small number of natural images. This augmented data is then used in conjunction with the self-supervised learning framework for training. The advantages of self-supervised learning are that in the first stage, it entirely relies on unlabeled data, and in the second stage, only a small amount of labeled data is needed for fine-tuning the model. This approach results in significant cost savings in terms of the manual effort required for data labeling. Experimental results demonstrate that the proposed model achieves a high F1-score when recognizing natural images of rare bird species. This indicates that the generated augmented data for training is of high quality, and the proposed model exhibits excellent generalization. The collaboration of these two approaches shows highly practical in real-world applications.

Keywords

bird recognition rare bird species self-supervised learning diffusion model BYOL

Get full access to this article

View all access options for this article.

References

Jain

Abbeel

. “Denoising diffusion probabilistic models”. Advances in Neural Information Processing Systems, 2020.

Grill

Strub

Altch'e

, et al. “Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning”. Advances in neural information processing systems, 2020.

Fetterman

Albrecht

, “Understanding Self-Supervised and Contrastive Learning with “Bootstrap Your Own Latent” (BYOL)”, August 24, 2020.

Goroshin

Bruna

Tompson

, et al. “Unsupervised Learning of Spatiotemporally Coherent Metrics”, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.

Jaiswal

Babu

Zadeh

, et al. A survey on contrastive self-supervised learning. Technologies 9.1, 2020.

GUI

Jie

, et al. A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.

Jascha Sohl-Dickstein

Weiss

Maheswaranathan

, et al. “Deep Unsupervised Learning using Nonequilibrium Thermodynamics”, Proceedings of the 32nd International Conference on Machine Learning. PMLR, 2015.

Blei

DM.

Kucukelbir

ALP

McAuliffe

. Variational inference: A review for statisticians. Journal of the American Statistical Association, 2017.

Goodfellow

Pouget-Abadie

Mirza

, et al. Generative adversarial networks. Communications of the ACM, 2020.

10.

Ramesh

Dhariwal

Nichol

, et al. “Hierarchical text-conditional image generation with clip latents”. arXiv preprint arXiv:2204.06125, 2022.

11.

Saharia

Chan

Saxena

, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.

12.

Rombach

Blattmann

Lorenz

, et al. High-Resolution Image Synthesis with Latent Diffusion Models, Computer Vision and Pattern Recognition (CVPR), 2022.

13.

Isola

Zhu

J-Y

Zhou

, et al. Image-to-Image Translation with Conditional Adversarial Networks. Computer Vision and Pattern Recognition (CVPR), 2017.

14.

de Sa

. Learning classification with unlabeled data, in Neural Inf. Process. Syst, 1994, pp. 112–119.

15.

Fan

, et al. “Momentum contrast for unsupervised visual representation learning”, in IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9729–9738.

16.

Chen

Fan

Girshick

, et al. “Improved baselines with momentum contrastive learning”. arXiv preprint arXiv:2003.04297, 2020.

17.

Chen

Kornblith

Norouzi

, et al. “A simple framework for contrastive learning of visual representations”, in Int. Conf. Mach. Learn., 2020, pp. 1597–1607.

18.

Chen

Kornblith

Swersky

, et al. “Big self-supervised models are strong semi-supervised learners”, in Neural Inf. Process. Syst., 2020, pp. 1–13.

19.

Chen

. “Exploring simple siamese representation learning”, in IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15750–15758.

20.

Caron

Touvron

Misra

, et al. “Emerging properties in self-supervised vision transformers”, in IEEE Int. Conf. Comput. Vis., 2021, pp. 9650–9660.

21.

Zbontar

Jing

Misra

, et al. “Barlow twins: Self-supervised learning via redundancy reduction”, in Int. Conf. Mach. Learn., 2021.

22.

Bardes

Ponce

LeCun

“Vicreg: Variance-invariancecovariance regularization for self-supervised learning”, in Int Conf Learn Represent, 2022, pp. 1–12.

Rare bird species recognition using diffusion models and self-supervised learning

Abstract

Keywords

Get full access to this article

References