Clothes retrieval based on ResNet and cluster triplet loss

Abstract

Clothes image retrieval is a task that retrieves exact or very similar clothes from a large clothes image gallery according to a given query image. The high deformability of clothes and similar designs make clothes retrieval very difficult. Convolutional neural networks perform well in the feature extraction field, but cannot extract sufficiently fine-grained features for clothes retrieval tasks. Therefore, a model based on ResNet was designed in this article, with several improvements made to it to ensure better feature discrimination. On the other hand, pair-based loss functions that are commonly used in image retrieval suffer from insufficient discrimination ability, difficulty in convergence, and time-consuming during training. Cluster triplet loss function is proposed in this article, which reduces the computational complexity in the training phase and enables the model to have a certain ability to resist noise labels. Experiments are completed on In-shop clothes retrieval benchmark and Consumer-to-shop clothes retrieval benchmark in Deepfashion, and stanford online products (SOP) to supplement. Our method can have Recall@1 improvements of 1.3–1.7% on all three datasets, surpassing the latest state-of-the-art methods.

Keywords

Clothes retrieval ResNet triplet loss deep metric learning

Get full access to this article

View all access options for this article.

References

Szegedy

Liu

Jia

, et al. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition. 07–12 June, 2015. IEEE: Boston, MA, USA. pp. 1–9. Doi: 10.1109/CVPR.2015.7298594

Simonyan

Zisserman

Very deep convolutional networks for large-scale image recognition. International conference on learning representations. 14–16 April, 2014, arXiv: Banff, Canada. 2015. 1–14.

Zhang

Ren

, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 27–30 June, 2016. IEEE: Las Vegas, NV, USA, pp. 770–778. Doi: 10.1109/CVPR.2016.90

Zhou

Liu

Deng

, et al. Clothing image classification algorithm based on convolutional neural network and optimized regularized extreme learning machine. Text Res J 2022; 92: 23–24.

Zhou

Deng

Wang

, et al. Classification of clothing images based on a parallel convolutional neural network and random vector functional link optimized by the grasshopper optimization algorithm. Text Res J 2022; 92: 1415–1428.

Shi

Yang

Clothing image classification with a dragonfly algorithm optimised online sequential extreme learning machine. Fibres Text East Eur 2021; 29: 91–96.

Zhou

Yang

, et al. Classifying fabric defects with evolving Inception v3 by improved L2, 1-norm regularized extreme learning machine. Text Res J 2022. https://journals.sagepub.com/doi/abs/10.1177/00405175221114633?journalCode=trjc .

Hadsell

Chopra

LeCun

Dimensionality reduction by learning an invariant mapping. Proceedings of the IEEE conference on computer vision and pattern recognition. 17–22 June, 2006. IEEE: Las New York, NY, USA. pp. 1735–1742. Doi: 10.1109/CVPR.2006.100

Schroff

Kalenichenko

Philbin

Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE conference on computer vision and pattern recognition. 07–12 June, 2015. IEEE: Boston, MA, USA. pp. 815–823. Doi: 10.1109/CVPR.2015.7298682

10.

Dalal

Triggs

Histograms of oriented gradients for human detection. Proceedings of the IEEE conference on computer vision and pattern recognition. 20–25 June, 2005. IEEE: San Diego, CA, USA. pp. 886–893. Doi: 10.1109/CVPR.2005.177

11.

Tola

Lepetit

Fua

DAISY: An efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pami 2010; 32: 815–830.

12.

Lowe

DG.

Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2004; 60: 91–110.

13.

Razavian

Azizpour

Sullivan

, et al. CNN features off-the-shelf: An astounding baseline for recognition. Proceedings of the international conference on computer vision and pattern recognition workshops 23–28 June, 2014. IEEE: Columbus, OH, USA. pp. 806–813. Doi: 10.1109/CVPRW.2014.131

14.

Gordo

Almazán

Revaud

, et al. End-to-end learning of deep visual representations for image retrieval. Int J Comput Vision 2017; 124: 237–254.

15.

Wang

Zhang

Huang

, et al. Cross-batch memory for embedding learning. Proceedings of the international conference on computer vision and pattern recognition 13-19 June, 2020. IEEE: Seattle, WA, USA. pp. 6388–6397. Doi: 10.1109/CVPR42600.2020.00642

16.

Movshovitz-Attias

Toshev

Leung

, et al. No fuss distance metric learning using proxies. Proceedings of the IEEE/CVF international conference on computer vision. 22–29 October, 2017. IEEE: Seattle, Venice, Italy. 360–368. Doi: 10.1109/ICCV.2017.47

17.

Venkataramanan

Psomas

Kijak

, et al. It takes two to tango: Mixup for deep metric learning. International conference on learning representations. 25-29 April, 2022. Springer, Cham: Virtual.

18.

Kim

Cho

, et al. Proxy anchor loss for deep metric learning. Proceedings of the IEEE conference on computer vision and pattern recognition 13–19 June, 2020, IEEE: Seattle, WA, USA. pp. 3235–3244. Doi: 10.1109/CVPR42600.2020.00330

19.

Wieczorek

Rychalska

Dabrowski

On the unreasonable effectiveness of centroids in image retrieval. Neural information processing 28th international conference 28 Novomeber-09 December, 2021 Publish on 05 December, 2021, Springer, Cham: New Orleans, LA, USA. pp. 212–223.

20.

Liu

Luo

Qiu

, et al. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. Proceedings of the IEEE conference on computer vision and pattern recognition 27–30 June, 2016. IEEE: Las Vegas, NV, USA. pp. 1096–1104. Doi: 10.1109/CVPR.2016.124

21.

Song

Xiang

Jegelka

, et al. Deep metric learning via lifted structured feature embedding. Proceedings of the IEEE conference on computer vision and pattern recognition 27–30 June, 2016. IEEE: Las Vegas, NV, USA. pp. 4004–4012. Doi: 10.1109/CVPR.2016.434

22.

Teh

DeVries

Taylor

GW.

Proxynca++: Revisiting and revitalizing proxy neighborhood component analysis. Proceedings of the European conference on computer vision 23–28 AUGUST, 2020. Glasgow, UK. pp. 448–464.

23.

Kuang

Gao

, et al. Fashion retrieval via graph reasoning networks on a similarity pyramid. Proceedings of the IEEE/CVF international conference on computer vision 27 October – 02 November, 2019, IEEE: Seoul, Korea (South), 2019. pp. 3066–3075. Doi: 10.1109/ICCV.2019.00316

24.

Wieczorek

Michalowski

Wróblewska

, et al. A strong baseline for fashion retrieval with person re-identification models. Neural information processing – 27th international conference 06–12 December, 2020, Springer, Cham: Virtual. pp. 294–301.