Abstract
Learning the similarity between fashion items is essential for many fashion-related tasks. Most methods based on global or local image similarity cannot meet the fine-grained retrieval requirements related to attributes. We are the first to clearly distinguish the concepts of attribute name and their values and divide fashion retrieval tasks that combine images and text into: attribute-guided retrieval and attribute-manipulated retrieval. We propose a hierarchical attribute-aware embedding network (HAEN) that takes images and attributes as input, learns multiple attribute-specific embedding spaces, and measures fine-grained similarity in the corresponding spaces. It can accurately map different attributes to the corresponding areas of the image, thereby facilitating the feature fusion of two different modalities of text and image, including enhancement and replacement. Then on this basis, we propose three attribute-manipulated similarity learning methods, HAEN_Avg, HAEN_Rec, and HAEN_Cmb. With comprehensive validation on two real-world fashion datasets, we demonstrate that our methods can effectively leverage semantic knowledge to improve image retrieval performance, including attribute-guided and attribute-manipulated retrieval tasks.
Get full access to this article
View all access options for this article.
