Joint index and cache technique for improving the effectiveness of a similar image search in big data framework

Abstract

Nowadays, the exponentially increasing amount of digital images available imposes a great challenge to a content-based image retrieval (CBIR) system due to the requirement of extensive-computing. Considering this challenge, this paper presents an approach to achieve effectiveness and scalability of a CBIR system in a large-scale dataset. To do that, we propose a cache mechanism to spare the distance computation efforts of a retrieval task in the CBIR system. Additionally, a MapReduce technique is presented to exploit the cached data in a parallel facility, thereby not only improving the performance of a CBIR system but also ensuring scalability for the system. Additionally, a collaborative caching service has been introduced for enhancing the data availability, thus decreasing the network traffic load due to fetching data remotely in the distributed environment. Moreover, by clustering the dataset before a search, this system can be efficient at responding to a user query since only a portion of the dataset is actually operated at a time. Through experiments, our approach obtains significant efficiency gains compared to other methods in terms of response time and achieves an acceptable accuracy ratio, which is applicable in the practical environment.

Keywords

CBIR cache index scheme MapReduce cloud

Get full access to this article

View all access options for this article.

References

Veltkamp ,

Burkhardt and

H.P.

Kriegel , State-of-the-Art in Content-Based Image and Video Retrieval. Springer Science & Business Media, Dordrecht, 2013.

Datta ,

Joshi ,

Li and

Ze Wang , Image retrieval: Ideas, influences, and trends of the new age, ACM Comput Surv 40 (2008).

A.W.M.

Smeulders ,

Worring ,

Santini ,

Gupta and

Jain , Content-based image retrieval at the end of the early years, in IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12) (2000), 1349–1380.

Rahmani ,

S.A.

Goldman ,

Zhang ,

S.R.

Cholleti and

J.E.

Fritts , Localized content-based image retrieval, IEEE Trans Pattern Anal Mach Intell 30(11) (2008), 1902–1912.

Wu ,

S.C.H.

Hoi ,

Jin ,

Zhu and

Yu , Distance metric learning from uncertain side information with application to automated photo tagging, in Proc ACM Int Conf Multimedia, Scottsdale, AZ, USA, 2009, pp. 135–144.

Bay ,

Tuytelaars and

Van Gool , SURF: Speeded Up Robust Features, In:

Leonardis ,

Bischof and

Pinz , (eds.) Computer Vision – ECCV 2006 ECCV 2006. Lecture Notes in Computer Science, vol 3951. Springer, Berlin, Heidelberg, 2006.

Wu ,

S.C.H.

Hoi and

Yu , Semantics-preserving bag-ofwords models and applications, IEEE Trans Image Process 19(7) (2010), 1908–1920.

Huang ,

S.R.

Kumar ,

Mitra ,

W.-J.

Zhu and

Zabih , Image indexing using color correlograms, in Proc IEEE Conf Comput Vis Pattern Recognit, 1997, pp. 762–768.

Belongie ,

Malik and

Puzicha , Shape matching and object recognition using shape contexts, IEEE Trans Pattern Anal Mach Intell 24(4) (2002), 509–522.

10.

D.G.

Lowe , Distinctive image features from scale-invariant keypoints, International Journal on Computer Vision 60(2) (2004).

11.

Jegou ,

Douze and

Schmid , Improving bag-offeatures for large scale image search, Int J Comput Vis 87(3) (2010), 316–336.

12.

Zheng ,

Wang ,

Liu and

Tian , Lp-norm IDF for large scale image search, in Proc IEEE Conf Comput Vis Pattern Recognit, 2013, pp. 1626–1633.

13.

He , et al., Mobile product search with bag of hash bits and boundary reranking, in Proc IEEE Conf Comput Vis Pattern Recognit, 2012, pp. 3005–3012.

14.

Feng ,

Yang and

Liu , An efficient indexing method for content-based image retrieval, Neurocomputing 106 (2013), 103–114. ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2012.10.021.

15.

T.D.T.

Nguyen and

E.-N.

Huh , An efficient similar image search framework for large-scale data on cloud, In Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication (IMCOM ’17) ACM, New York, NY, USA, 2017, 8. Article 65.

16.

J.S.

Hare ,

Samangooei and

P.H.

Lewis , Practical scalable image analysis and indexing using Hadoop, Multimedia Tools Appl 71(3) (2014), 1215–1248.

17.

D.S.

Yin and

D.B.

Liu , Content-based image retrial based on hadoop, Mathematical Problems in Engineering 2013 (2013), 7. Article ID 684615.

18.

Moise ,

Shestakov ,

Gudmundsson and

Amsaleg , Indexing and searching 100M images with map-reduce, In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval (ICMR ’13), ACM, New York, NY, USA, 2013, pp. 17–24.

19.

Sun ,

Wang ,

Xu and

Zhang , Indexing billions of images for sketch-based retrieval, In Proceedings of the 21st ACM International Conference on Multimedia (MM ’13), ACM, New York, NY, USA, 2013, pp. 233–242.

20.

Jegou ,

Douze and

Schmid , Product Quantization for Nearest Neighbor Search, in IEEE Transactions on Pattern Analysis and Machine Intelligence 33(1) (2011), 117–128.

21.

Herwig

Lejsek ,

Björn Þór

Jónsson and

Amsaleg , NV-Tree: Nearest neighbors at the billion scale, In Proceedings of the 1st ACM International Conference on Multimedia Retrieval (ICMR ’11), ACM, New York, NY, USA, Article 54, 2011, p. 8.

22.

Amsaleg , A database perspective on large scale highdimensional indexing, Habilitation à diriger des recherches, Université de Rennes 1, 2014.

23.

Babenko and

Lempitsky , The inverted multi-index, in IEEE Transactions on Pattern Analysis and Machine Intelligence 37(6) (2015), 1247–1260.

24.

Kumar ,

Zhang and

Nayar , What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? In Proceedings of the 10th European Conference on Computer Vision: Part II (ECCV ’08),

David

Forsyth ,

Philip

Torr , and

Andrew

Zisserman (Eds.), Springer-Verlag, Berlin, Heidelberg, 2008, pp. 364–378.

25.

Liu ,

Rosenberg and

H.A.

Rowley , Clustering Billions of Images with Large Scale Nearest Neighbor Search, 2007 IEEE Workshop on Applications of Computer Vision (WACV ’07), Austin, TX, 2007, pp. 28–28.

26.

Jégou ,

Perronnin ,

Douze ,

Sánchez ,

Pérez and

Schmid , Aggregating local image descriptors into compact codes, in IEEE Transactions on Pattern Analysis and Machine Intelligence 34(9) (2012), 1704–1716.

27.

Zhang ,

Liu ,

Luo and

Lang , DIRS: Distributed image retrieval system based on MapReduce, 5th International Conference on Pervasive Computing and Applications, Maribor, 2010, pp. 93–98.

28.

Gu and

Gao , A Content-Based Image Retrieval System Based on Hadoop and Lucene, 2012 Second International Conference on Cloud and Green Computing, Xiangtan, 2012, pp. 684–687.

29.

R.K.

Grace ,

Manimegalai and

S.S.

Kumar , Medical Image Retrieval System in Grid Using Hadoop Framework, 2014 International Conference on Computational Science and Computational Intelligence, Las Vegas, NV, 2014, pp. 144–148.

30.

Dravyakar ,

Swapnil and

S.B.

Mane , Private Content Based Image Retrieval Using Hadoop, 2014.

31.

Hare ,

Samangooei ,

P.D.

Dupplaw and

H.P.

Lewis , ImageTerrier: An extensible platform for scalable high-performance image retrieval, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR, 2012. 10.1145/2324796.2324844.

32.

Jai-Andaloussi ,

Elabdouli ,

Chaffai ,

Madrane and

Sekkaki , Medical content based image retrieval by using the Hadoop framework, ICT 2013, Casablanca, 2013, pp. 1–5.

33.

Zhou ,

Li ,

Sun and

Tian , Collaborative index embedding for image retrieval, in IEEE Transactions on Pattern Analysis and Machine Intelligence 40(5) (2018), 1154–1166.

34.

Mohamed ,

E.A.

Khalid ,

Mohammed and

Brahim , Content-Based Image Retrieval Using Convolutional Neural Networks, In:

Mizera-Pietraszko ,

Pichappan and

Mohamed , (eds.), Lecture Notes in Real-Time Intelligent Systems RTIS 2017 Advances in Intelligent Systems and Computing, vol 756. Springer, Cham.

35.

R.R.

Saritha ,

Paul and

P.G.

Kumar , Cluster Comput, (2018). https://doi.org/10.1007/s10586-018-1731-0

36.

Skopal ,

Lokoc and

Bustos , D-Cache: Universal Distance Cache for Metric Access Methods, in IEEE Transactions on Knowledge and Data Engineering 24(5) (2012), 868–881.

37.

Jegou ,

Douze and

Schmid , Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. In:

Forsyth ,

Torr and

Zisserman , (eds.), Computer Vision –ECCV 2008 ECCV 2008 Lecture Notes in Computer Science, vol 5302, Springer, Berlin, Heidelberg.

38.

Schaefer and

Stich , UCID: An uncompressed color image database, Storage and Retrieval Methods and Applications for Multimedia 5307 (2004), 472–480.

39.

T.-S.

Chua ,

Tang ,

Hong ,

Li ,

Luo and

Zheng , NUS-WIDE: A real-world web image database from National University of Singapore, In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR ’09) ACM, New York, NY, USA, 2009, 9. Article 48.

40.

M.J.

Huiskes and

M.S.

Lew , The MIR flickr retrieval evaluation, In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), ACM, New York, NY, USA, 2008, pp. 39–43.

41.

Mukherjee ,

Mukhopadhyay and

Mitra , A survey on image retrieval performance of different bag of visual words indexing techniques, Proceedings of the 2014 IEEE Students’ Technology Symposium, Kharagpur, 2014, pp. 99–104.

42.

Gong ,

Lazebnik ,

Gordo and

Perronnin , Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval, in IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12) (2013), 2916–2929.

43.

Weiss ,

Torralba and

Fergus , Spectral hashing, NIPS, 2008.

44.

Hong ,

Yang ,

Wang and

Hua , Learning visual semantic relationships for efficient visual retrieval, in IEEE Transactions on Big Data 1(4) (2015), 152–161.

45.

Redis Replication https://redis.io/topics/replication/

46.

Karami ,

Prasad and

M.S.

Shehata , Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images, CoRR, abs/1710.02726, (2017).

47.

OpenCV Tutorials. (Accessed on 2019/20/02) https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/pysurf_intro/py_surf_intro.html#surf