Abstract
Deep Hashing is a technique used for retrieving images on a large-scale, encoding the latent code of images into binary codes, which significantly reduces computational and storage costs in image retrieval. This enables fast similarity comparison and search. However, this technique encounters two significant challenges: the extraction of discriminating category-specific image features and the conflict between metric learning and quantization learning. The latter challenge often results in the binary representation of latent codes being considerably ambiguous. To tackle these challenges, this paper proposes a novel Cross-Scale Fusion Deep Hash Network. The model is built upon a dual-branch framework, aiming to capture the most representative retrieval features. One branch employs Spatial Pyramid Pooling layers and a self-attention mechanism for local information extraction, whereas the other branch uses a sliding window methodology for capturing global information. Upon obtaining the local and global information, the Cross Feature Synergy Module proposed in this paper integrates these data points to form a comprehensive feature vector, ultimately generating a complete representation of the image. In order to address the conflict between metric learning and quantization learning, as well as improve the binary codes further, this paper introduces a meticulously designed, threshold-dependent Hash-Guided Metric Loss (HGM-Loss). The novel network proposed in this paper demonstrates superior retrieval performance in standard benchmark tests on multiple datasets, including CIFAR-10, CIFAR-100, ImageNet, and MS-COCO, outperforming the existing hash methods.
Get full access to this article
View all access options for this article.
