Sage Journals: Discover world-class research

Abstract

Learning the similarity between fashion items is essential for many fashion-related tasks. Most methods based on global or local image similarity cannot meet the fine-grained retrieval requirements related to attributes. We are the first to clearly distinguish the concepts of attribute name and their values and divide fashion retrieval tasks that combine images and text into: attribute-guided retrieval and attribute-manipulated retrieval. We propose a hierarchical attribute-aware embedding network (HAEN) that takes images and attributes as input, learns multiple attribute-specific embedding spaces, and measures fine-grained similarity in the corresponding spaces. It can accurately map different attributes to the corresponding areas of the image, thereby facilitating the feature fusion of two different modalities of text and image, including enhancement and replacement. Then on this basis, we propose three attribute-manipulated similarity learning methods, HAEN_Avg, HAEN_Rec, and HAEN_Cmb. With comprehensive validation on two real-world fashion datasets, we demonstrate that our methods can effectively leverage semantic knowledge to improve image retrieval performance, including attribute-guided and attribute-manipulated retrieval tasks.

Keywords

Get full access to this article

View all access options for this article.

References

Cheng

Song

Chen

Hidayati

S.C.

and Liu

, Fashion meets computer vision: A survey, ACM Computing Surveys 54(4) (2021), 1–41.

Huang

Feris

R.S.

Chen

and Yan

, Cross-domain image retrieval with a dual attribute-aware ranking network, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1062–1070.

Kuang

Gao

Luo

Chen

Lin

and Zhang

, Fashion retrieval via graph reasoning networks on a similarity pyramid, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3066–3075.

Wen

Song

Yang

Zhan

and Nie

, Comprehensive linguistic-visual composition network for image retrieval, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 1369–1378.

Jandial

Badjatiya

Chawla

Chopra

Sarkar

and Krishnamurthy

, Sac: Semantic attention composition for text-conditioned image retrieval, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 4021–4030.

Yang

Wang

Zhou

and Li

, Cross-modal joint prediction and alignment for composed query image retrieval, in: Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3303–3311.

Jiang

Sun

Murphy

L.-J.

Fei-Fei

and Hays

, Composing text and image for image retrieval-an empirical odyssey, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6439–6448.

Liu

Luo

Qiu

Wang

and Tang

, Deepfashion: Powering robust clothes recognition and retrieval with rich annotations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1096–1104.

Wang

Zhang

and Yang

, Cross-domain image retrieval with attention modeling, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 1654–1662.

10.

Wang

Zhang

Zhou

and Gu

, Clothing retrieval with visual attention model, in: 2017 IEEE Visual Communications and Image Processing (VCIP), IEEE, 2017, pp. 1–4.

11.

Han

Jiang

Y.-G.

and Davis

L.S.

, Learning fashion compatibility with bidirectional lstms, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 1078–1086.

12.

Veit

Belongie

and Karaletsos

, Conditional similarity networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 830–838.

13.

Liao

Zhao

Ngo

C.-W.

and Chua

T.-S.

, Interpretable multimodal retrieval for fashion products, in: Proceedings of the 26th ACM International Conference on Multimedia, 2018, pp. 1571–1579.

14.

Dong

Long

Zhang

Xue

and Ji

, Fine-grained fashion similarity learning by attribute-specific embedding network, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 11741–11748.

15.

Yan

Ding

Zhang

and Wang

, Learning fashion similarity based on hierarchical attribute embedding, in: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, 2021, pp. 1–8.

16.

Chen

Gong

and Bazzani

, Image search with text feedback by visiolinguistic attention learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3001–3011.

17.

Zhang

and Zhao

, Clothes image caption generation with attribute detection and visual attention model, Pattern Recognition Letters 141 (2021), 68–74.

18.

Wei

Zhang

and Wu

, Multi-modality cross attention network for image and sentence matching, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10941–10950.

19.

Zhang

Ren

and Sun

, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

20.

Vasileva

M.I.

Plummer

B.A.

Dusad

Rajpal

Kumar

and Forsyth

, Learning type-aware embeddings for fashion compatibility, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 390–405.

21.

Dong

Yang

and Wang

, Dual encoding for zero-example video retrieval, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9346–9355.

22.

Zou

Kong

Wong

Wang

Liu

and Cao

, Fashionai: A hierarchical dataset for fashion understanding, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 296–304.

23.

Song

Feng

Liu

Nie

and Ma

, Neurostylist: Neural compatibility modeling for clothing matching, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 753–761.

24.

Han

Huang

P.X.

Zhang

Zhu

Zhao

and Davis

L.S.

, Automatic spatially-aware fashion concept discovery, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1463–1471.

Attribute-guided and attribute-manipulated similarity learning network for fashion image retrieval

Abstract

Keywords

Get full access to this article

References