Sage Journals: Discover world-class research

Abstract

Deep Hashing is a technique used for retrieving images on a large-scale, encoding the latent code of images into binary codes, which significantly reduces computational and storage costs in image retrieval. This enables fast similarity comparison and search. However, this technique encounters two significant challenges: the extraction of discriminating category-specific image features and the conflict between metric learning and quantization learning. The latter challenge often results in the binary representation of latent codes being considerably ambiguous. To tackle these challenges, this paper proposes a novel Cross-Scale Fusion Deep Hash Network. The model is built upon a dual-branch framework, aiming to capture the most representative retrieval features. One branch employs Spatial Pyramid Pooling layers and a self-attention mechanism for local information extraction, whereas the other branch uses a sliding window methodology for capturing global information. Upon obtaining the local and global information, the Cross Feature Synergy Module proposed in this paper integrates these data points to form a comprehensive feature vector, ultimately generating a complete representation of the image. In order to address the conflict between metric learning and quantization learning, as well as improve the binary codes further, this paper introduces a meticulously designed, threshold-dependent Hash-Guided Metric Loss (HGM-Loss). The novel network proposed in this paper demonstrates superior retrieval performance in standard benchmark tests on multiple datasets, including CIFAR-10, CIFAR-100, ImageNet, and MS-COCO, outperforming the existing hash methods.

Keywords

Deep hashing binary encoding image retrieval

Get full access to this article

View all access options for this article.

References

Noh

, Araujo

, Sim

, et al., Large-scale image retrieval with attentive deep local features, Proceedings of the IEEE International Conference on Computer Vision (2017), 3456–3465.

Datta

, Joshi

, Li

, et al., Image retrieval: Ideas, influences, and trends of the new age, ACM Computing Surveys (Csur) 40(2) (2008), 1–60.

Yan

, Gong

, Wei

, et al., Deep multi-view enhancement hashing for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 43(4) (2020), 1445–1451.

Zhang

and Rui

, Image search—from thousands to billions in 20 years, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9(1s) (2013), 1–20.

Kim

, Parra

, Yue

, et al., Robust local and global shape context for tattoo image matching, 2015 IEEE International Conference on Image Processing (ICIP) (2015), 2194–2198.

Fan

, Wu

and Hu

, Aggregating gradient distributions into intensity orders: A novel local image descriptor, CVPR 2011. IEEE (2011), 2377–2384.

, et al., Deep hash learning for efficient image retrieval, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (2017), 579–584.

Movshovitz-Attias

, Toshev

, Leung

T.K.

, et al., No fuss distance metric learning using proxies, Proceedings of the IEEE International Conference on Computer Vision (2017), 360–368.

a R, Y. Pan, H. Lai, et al., Supervised hashing for image retrieval via image representation learning, Proceedings of the AAAI Conference on Artificial Intelligence 28(1) (2014).

10.

Erin Liong

, Lu

, Wang

, et al., Deep hashing for compact binary codes learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 2475–2483.

11.

Luo

, Wang

, Wu

, et al., A survey on deep hashing methods, ACM Transactions on Knowledge Discovery from Data 17(1) (2023), 1–50.

12.

Kulis

, Metric learning: A survey, Foundations and Trends^® in Machine Learning 5(4) (2013), 287–364.

13.

Cao

, Long

, Wang

, et al., Hashnet: Deep learning to hash by continuation, Proceedings of the IEEE International Conference on Computer Vision (2017), 5608–5617.

14.

Qin

, Gong

, Liu

, et al., Forward and backward information retention for accurate binary neural networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 2250–2259.

15.

Jose

, Horstmann

and Ohm

J.R.

, Optimized binary hashing codes generated by siamese neural networks for image retrieval, 2018 26th European Signal Processing Conference (EUSIPCO) (2018), 1487–1491.

16.

Long

, Wei

, Qi

, et al., A deep hashing method based on attention module for image retrieval, 2020 13th International Conference on Intelligent Computation Technology and Automation (ICICTA) (2020), 284–288.

17.

, Ji

and Huang

, Joint event extraction via structured prediction with global features, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2013), 73–82.

18.

Cao

, Araujo

and Sim

, Unifying deep local and global features for image search, Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer International Publishing (2020), 726–743.

19.

Kabbai

, Abdellaoui

and Douik

, Image classification by combining local and global features, The Visual Computer 35 (2019), 679–693.

20.

Lai

, Pan

, Liu

, et al., Simultaneous feature learning and hash coding with deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 3270–3278.

21.

Zhang

, Lin

, Zhang

, et al., Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification, IEEE Transactions on Image Processing 24(12) (2015), 4766–4779.

22.

Sohn

, Improved deep metric learning with multi-class n-pair loss objective, Advances in Neural Information Processing Systems (2016), 29.

23.

Hoe

J.T.

, Ng

K.W.

, Zhang

, et al., One loss for all: Deep hashing with a single cosine similarity based learning objective, Advances in Neural Information Processing Systems 34 (2021), 24286–24298.

24.

Sun

, Ye

, Li

, et al., Unsupervised deep hashing through learning soft pseudo label for remote sensing image retrieval, Knowledge-Based Systems 239 (2022), 107807.

25.

Roy

, Sangineto

, Demir

, et al., Metric-learning-based deep hashing network for content-based retrieval of remote sensing images, IEEE Geoscience and Remote Sensing Letters 18(2) (2020), 226–230.

26.

, Li

, Meng

, et al., Discriminative deep metric learning for asymmetric discrete hashing, Neurocomputing 380 (2020), 115–124.

27.

Zhang

and Yan

, Improved deep classwise hashing with centers similarity learning for image retrieval, 2020 25th International Conference on Pattern Recognition (ICPR) (2021), 10516–10523.

28.

Yuan

, Wang

, Zhang

, et al., Central similarity quantization for efficient image and video retrieval, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 3083–3092.

29.

, Zhang

, Ren

, et al., Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9) (2015), 1904–1916.

30.

, Wang

, et al., Triplet Deep Hashing with Joint Supervised Loss for Fast Image Retrieval, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) (2019), 606–613.

31.

Xuan

, Shim

and Lee

S.G.

, Deep Semantic Hashing Using Pairwise Labels, IEEE Access 9 (2021), 91934–91949.

32.

Wang

, Chen

, Zhang

, et al., Weakly supervised deep hyperspherical quantization for image retrieval, Proceedings of the AAAI Conference on Artificial Intelligence 35(4) (2021), 2755–2763.

33.

Zhou

, Yang

, Wang

, et al., Scalable feature matching by dual cascaded scalar quantization for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 38(1) (2015), 159–171.

34.

Vardy

, The intractability of computing the minimum distance of a code, IEEE Transactions on Information Theory 43(6) (1997), 1757–1766.

35.

Brouwer

A.E.

and Verhoeff

, An updated table of minimum-distance bounds for binary linear codes, IEEE Transactions on Information Theory 39(2) (1993), 662–677.

36.

Griesmer

J.H.

, A bound for error-correcting codes, IBM Journal of Research and Development 4(5) (1960), 532–542.

37.

Kim

, Kim

, Cho

, et al., Proxy anchor loss for deep metric learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 3238–3247.

38.

Krizhevsky

and Hinton

, Learning multiple layers of features from tiny images, (2009), 7.

39.

Lin

T.Y.

, Maire

, Belongie

, et al., Microsoft coco: Common objects in context, Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (2014), 740–755.

40.

Russakovsky

, Deng

, Su

, et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision 115 (2015), 211–252.

41.

Zhe

, Chen

and Yan

, Deep class-wise hashing: Semantics-preserving hashing via class-wise loss, IEEE Transactions on Neural Networks and Learning Systems 31(5) (2019), 1681–1695.

42.

W.J.

, Wang

and Kang

W.C.

, Feature learning based deep supervised hashing with pairwise labels, arxiv preprint arxiv:1511.03855 (2015).

43.

, Sun

, He

, et al., Deep supervised discrete hashing, Advances in Neural Information Processing Systems (2017), 30.

44.

Xue

, Shi

, He

, et al., Cross-Scale Context Extracted Hashing for Fine-Grained Image Binary Encoding, arxiv preprint arxiv:2210.07572 (2022).

45.

Qin

, Wang

, Bai

, et al., FFA-Net: Feature fusion attention network for single image dehazing, Proceedings of the AAAI Conference on Artificial Intelligence 34(7) (2020), 11908–11915.

46.

Zhang

, Li

, et al., Cfnet: Cascade fusion network for dense prediction, arXiv preprint arXiv:2302.06052. (2023).

47.

Quan

, Zhang

, et al., Centralized feature pyramid for object detection, IEEE Transactions on Image Processing (2023).

Deep hash networks with cross-scale feature fusion for optimal binary encoding

Abstract

Keywords

Get full access to this article

References