Dual discriminative auto-encoder network for zero shot image recognition

Abstract

Zero Shot learning (ZSL) aims to use the information of seen classes to recognize unseen classes, which is achieved by transferring knowledge of the seen classes from the semantic embeddings. Since the domains of the seen and unseen classes do not overlap, most ZSL algorithms often suffer from domain shift problem. In this paper, we propose a Dual Discriminative Auto-encoder Network (DDANet), in which visual features and semantic attributes are self-encoded by using the high dimensional latent space instead of the feature space or the low dimensional semantic space. In the embedded latent space, the features are projected to both preserve their original semantic meanings and have discriminative characteristics, which are realized by applying dual semantic auto-encoder and discriminative feature embedding strategy. Moreover, the cross modal reconstruction is applied to obtain interactive information. Extensive experiments are conducted on four popular datasets and the results demonstrate the superiority of this method.

Keywords

Zero shot learning domain shift dual auto-encoder discriminative projection

Get full access to this article

View all access options for this article.

References

Akata

Zeynep

, Perronnin

Florent

, Harchaoui

Zaid

and Schmid

Cordelia

, Label-embedding for attribute-based classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 819–826, 2013.

Akata

Zeynep

, Reed

Scott

, Walter

Daniel

, Lee

Honglak

and Schiele

Bernt

, Evaluation of output embeddings for finegrained image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2927–2936, 2015.

Annadani

Yashas

and Biswas

Soma

, Preserving semantic relations for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7603–7612, 2018.

Atzmon

Yuval

and Chechik

Gal

, Probabilistic and-or attribute grouping for zero-shot learning. arXiv preprint arXiv:1806.02664, 2018.

Chao

Wei-Lun

, Changpinyo

Soravit

, Gong

Boqing

and Sha

Fei

, An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European Conference on Computer Vision, pages 52–68. Springer, 2016.

Deng

Jia

, Dong

Wei

, Socher

Richard

, Li

Li-Jia

, Li

Kai

and Fei-Fei

, Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.

Ding

Zhengming

, Shao

Ming

and Fu

Yun

, Generative zeroshot learning via low-rank embedded semantic dictionary, IEEE Transactions on Pattern Analysis and Machine Intelligence 41(12) (2018), 2861–2874.

Farhadi

Ali

, Endres

Ian

, Hoiem

Derek

and Forsyth

David

, Describing objects by their attributes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1778–1785. IEEE, 2009.

Frome

Andrea

, Corrado

Greg S

, Shlens

Jon

, Bengio

Samy

, Dean

Jeff

, Ranzato

Marc’Aurelio

and Mikolov

Tomas

, Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121–2129, 2013.

10.

Yanwei

, Hospedales

Timothy M

, Xiang

Tao

and Gong

Shaogang

, Transductive multi-view zero-shot learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(11) (2015), 2332–2345.

11.

Kaiming

, Zhang

Xiangyu

, Ren

Shaoqing

and Sun

Jian

, Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.

12.

Huang

, Wang

Changhu

, Yu

Philip S

and Wang

Chang-Dong

, Generative dual adversarial network for generalized zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 801–810, 2019.

13.

Huang

Lifu

, Ji

Heng

, Cho

Kyunghyun

and Voss

Clare R

, Zero-shot transfer learning for event extraction. arXiv preprint arXiv:1707.01066, 2017.

14.

Jiang

Huajie

, Wang

Ruiping

, Shan

Shiguang

and Chen

Xilin

, Learning class prototypes via structure alignment for zero-shot recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pages 118–134, 2018.

15.

Kodirov

Elyor

, Xiang

Tao

and Gong

Shaogang

, Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3174–3183, 2017.

16.

Lampert

Christoph H

, Nickisch

Hannes

and Harmeling

Stefan

, Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 951–958. IEEE, 2009.

17.

Chuanlong

, Ye

Xiufen

, Yang

Haibo

, Han

Yatong

, Li

Xiang

and Jia

Yunpeng

, Generalized zero shot learning via synthesis pseudo features, IEEE Access 7 (2019), 87827–87836.

18.

Xinghua

, Shen

Huanfeng

, Zhang

Liangpei

, Zhang

Hongyan

, Yuan

Qiangqiang

and Yang

Gang

, Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning, IEEE Transactions on Geoscience and Remote Sensing 52(11) (2014), 7086–7098.

19.

Liu

Shichen

, Long

Mingsheng

, Wang

Jianmin

and Jordan

Michael I

, Generalized zero-shot learning with deep calibration network. In Advances in Neural Information Processing Systems, pages 2005–2015, 2018.

20.

Long

Teng

, Xu

Xing

, Li

Youyou

, Shen

Fumin

, Song

Jingkuan

and Shen

Heng Tao

, Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In Proceedings of the 26th ACM International Conference on Multimedia, pages 1802–1810, 2018.

21.

van der Maaten

Laurens

and Hinton.

Geoffrey

, Visualizing data using t-sne, Journal of Machine Learning Research 9(Nov) (2008), 2579–2605.

22.

Mikolov

Tomas

, Chen

Kai

, Corrado

Greg

and Dean

Jeffrey

, Efficient estimation ofword representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

23.

Mishra

Ashish

, Reddy

Shiva Krishna

, Mittal

Anurag

and Murthy

Hema A

, A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 2188–2196, 2018.

24.

Norouzi

Mohammad

, Mikolov

Tomas

, Bengio

Samy

, Singer

Yoram

, Shlens

Jonathon

, Frome

Andrea

, Corrado

Greg S

and Dean

Jeffrey

, Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.

25.

Patterson

Genevieve

, Xu

Chen

, Su

Hang

and Hays

James

, The sun attribute database: Beyond categories for deeper scene understanding, International Journal of Computer Vision 108(1–2) (2014), 59–81.

26.

Romera-Paredes

Bernardino

and Torr.

Philip

, An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning, pages 2152–2161, 2015.

27.

Sivalingam

Ravishankar

, Somasundaram

Guruprasad

, Morellas

Vassilios

, Papanikolopoulos

Nikolaos

, Lotfallah

Osama

and Park

Youngchoon

, Dictionary learning based object detection and counting in traffic scenes. In Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, pages 42–48, 2010.

28.

Verma

Vinay Kumar

and Rai

Piyush

, A simple exponential family framework for zero-shot learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 792–808. Springer, 2017.

29.

Wah

Catherine

, Branson

Steve

, Welinder

Peter

, Perona

Pietro

and Belongie

Serge

, The caltech-ucsd birds-200-2011 dataset. 2011.

30.

Wang

Keze

, Lin

Liang

, Zuo

Wangmeng

, Gu

Shuhang

and Zhang

Lei

, Dictionary pair classifier driven convolutional neural networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2138–2146, 2016.

31.

Wang

Wenlin

, Pu

Yunchen

, Verma

Vinay Kumar

, Fan

Kai

, Zhang

Yizhe

, Chen

Changyou

, Rai

Piyush

and Carin

Lawrence

, Zero-shot learning via class-conditioned deep generative models. arXiv preprint arXiv:1711.05820, 2017.

32.

Xian

Yongqin

, Akata

Zeynep

, Sharma

Gaurav

, Nguyen

Quynh

, Hein

Matthias

and Schiele

Bernt

, Latent embeddings for zeroshot classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 69–77, 2016.

33.

Xian

Yongqin

, Lampert

Christoph H

, Schiele

Bernt

and Akata

Zeynep

, Zero-shot learninga comprehensive evaluation of the good, the bad and the ugly, IEEE Transactions on Pattern Analysis and Machine Intelligence 41(9) (2018), 2251–2265.

34.

Yunlong

, Ji

Zhong

, Pang

Yanwei

, Guo

Jichang

, Zhang

Zhongfei

and Wu

Fei

, Bi-adversarial auto-encoder for zero-shot learning. arXiv preprint arXiv:1811.08103, 2018.

35.

Zhang

Haofeng

, Liu

Jingren

, Yao

Yazhou

and Long

Yang

, Pseudo distribution on unseen classes for generalized zero shot learning, Pattern Recognition Letters 135 (2020), 451–458.

36.

Zhang

Haofeng

, Long

Yang

, Guan

and Shao

Ling

, Triple verification network for generalized zero-shot learning, IEEE Transactions on Image Processing 28(1) (2018), 506–517.

37.

Zhang

Haofeng

, Long

Yang

and Shao

Ling

, Zero-shot leaning and hashing with binary visual similes, Multimedia Tools and Applications 78(17) (2019), 24147–24165.

38.

Zhang

Haofeng

, Long

Yang

, Yang

Wankou

and Shao

Ling

, Dual-verification network for zero-shot learning, Information Sciences 470 (2019), 43–57.

39.

Zhang

Haofeng

, Mao

Huaqi

, Long

Yang

, Yang

Wankou

and Shao

Ling

, A probabilistic zero-shot learning method via latent nonnegative prototype synthesis of unseen classes, IEEE Transactions on Neural Networks and Learning Systems, 2019.

40.

Zhang

Hongguang

and Koniusz

Piotr

, Zero-shot kernel learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7670–7679, 2018.

41.

Zhang

Ziming

and Saligrama

Venkatesh

, Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision, pages 4166–4174, 2015.