Automatic image annotation using model fusion and multi-label selection algorithm

Abstract

Automatic Image Annotation (AIA) aims to provide a semantic description for the content of image by assigning a set of textual labels. The recent approaches mainly focus on the improvement of single model and neglect the potential advantages of different models. In order to make full use of the advantages of different annotation models, Dual Model based on Multi-Label Selection Algorithm(DM-SA) is proposed in this research which combines a discriminative model with a nearest-neighbor-based model. The algorithm takes consideration of the advantages of each model, thus provides better annotation performance. A deep Convolutional Neural Network (CNN) is used to obtain visual representation of images first, then a discriminative model, CNN with Label Smoothing (CNN-LS), and a nearest-neighbor-based model, 2PKNN with Canonical Correlation Analysis (2PKNN-CCA) generate candidate label set respectively. Finally, a multi-label selection algorithm based on inverse document frequency is adopted to assign the final labels from two candidate label sets. Experimental results based on Corel5K and IAPRTC-12 datasets show that the proposed method can achieve state-of-the-art performance for average recall, 0.52 and 0.42 on Corel5K and IAPRTC-12 respectively.

Keywords

Automatic image annotation deep learning CNN model fusion multi-label selection

Get full access to this article

View all access options for this article.

References

M. Meeker, Internet Trends -Code Conference, Los Angeles: Kleiner Perkins Caufield & Byers, (2018). https://www.kleinerperkins.com/files/INTERNET_TRENDS_REPORT_2018.pdf.

Murthy

V.N.

, Can

E.F.

and Manmatha

, A hybrid model for automatic image annotation, Proceedings of International Conference on Multimedia Retrieval, 2014, pp. 369–373.

Zhang

D.S.

, Islam

M.M.

and Lu

G.J.

, A review on automatic image annotation techniques, Pattern Recognition 45(1) (2012)346–362.

Wang

, Zhang

and Zhang

H.J.

, Learning to reduce the semantic gap in web image retrieval and annotation, Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 2008, pp. 355–362.

Zang

, Xu

H.M.

and Zhang

Y.M.

, Multi-view mixed-norm sparse coding for image annotation, ICIC Express Letters Part B: Applications 7(11) (2016), 2483–2490.

Dong

, Chen

C.L.

and He

, Image Super-Resolution Using Deep Convolutional Networks, Proceedings of IEEE Trans Pattern Anal Mach Intell, 2016, pp. 295–307.

Schmidhuber and Jürgen , Deep learning in neural networks: An overview, Neural Networks 61 (2015), 85–117.

Krizhevsky

, Sutskever

and Hinton

G.E.

, ImageNet classification with deep convolutional neural networks, Proceedings of International Conference on Neural Information Processing Systems Curran Associates, 2012, pp. 1097–1105.

, Zhang

, Ren

and Sun

, Deep Residual Learning for Image Recognition, Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.

10.

Verma

and Jawahar

C.V.

, Image Annotation Using Metric Learning in Semantic Neighbourhoods, Proceedings of European Conference on Computer Vision, 2012, pp. 836–849.

11.

Murthy

V.N.

, Venkatesh

N.S.

and Manmatha

, Automatic Image Annotation using Deep Learning Representations, Proceedings of ACM on International Conference on Multimedia Retrieval, 2015, pp. 603–606.

12.

Wang

, Zhang

and Wang

, Automatic image annotation based on transfer learning and multi-label smoothing strategy, Journal of Computer Applications 38(11) (2018), 3199–3203.

13.

Kai

, Wu

and Qi

, A domain keyword analysis approach extending term frequency-keyword active index with google word2Vec model, Scientometrics 114(1) (2018), 1–38.

14.

Robertson

, Understanding inverse document frequency: On theoretical arguments for IDF, Journal of Documentation 60(5) (2004), 503–520.

15.

Amiri

S.H.

and Jamzad

, Automatic image annotation using semi-supervised generative modeling, Pattern Recognition 48 (2015), 174–188.

16.

Jeon

, Lavrenko

and Manmatha

, Automatic image annotation and retrieval using cross-media relevance models, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, 2003, pp. 119–126.

17.

Feng

S.L.

, Manmatha

and Lavrenko

, Multiple bernoulli relevance models for image and video annotation, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004, pp. 1002–1009.

18.

Ngo-Tien

, Ngo-Tien

and Ngo-Tien

, Fully automated multi-label image annotation by convolutional neural network and adaptive thresholding, Proceedings of the Seventh Symposium on Information and Communication Technology ACM, 2016, pp. 323–330.

19.

Kalayeh , Mahdi

, Idrees

and Shah

, NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization, Proeceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 184–191.

20.

and Yuan

, Automatic Image Annotation Using Adaptive Weighted Distance in Improved K Nearest Neighbors Framework, Proceedings of Pacific Rim Conference on Multimedia, 2016, pp. 345–354.

21.

Jin

and Jin

S.W.

, Image distance metric learning based on neighborhood sets for automatic image annotation, Journal of Visual Communication & Image Representation 34(C) (2016), 167–175.

22.

Kashani

M.M.

and Amiri

S.H.

, Leveraging deep learning representation for search-based image annotation, Proceedings of 2017 Artificial Intelligence and Signal Processing Conference (AISP), 2017, pp. 156–161.

23.

, Chen

and Sun

, Tagging like Humans: Diverse and Distinct Image Annotation, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7967–7975.

24.

Barddal

J.P.

and Bifet

, A survey on ensemble learning for data stream classification, ACM Computing Surveys 50(2) (2017), 23:1–23:36.

25.

Glorot

and Bengio

, Understanding the difficulty of training deep feedforward neural networks, Proceedings of AISTATS, 2010, pp. 249–256.

26.

Moran

and Lavrenko

, A sparse kernel relevance model for automatic image annotation, International Journal of Multimedia Information Retrieval 3(4) (2014), 209–229.