Sage Journals: Discover world-class research

Abstract

Semantic image interpretation (SII) is the process of generating meaningful descriptions of the content of images. Background knowledge (BK), in the form of logical theories, is extremely useful for SII. State-of-the-art algorithms for SII mainly adopt a bottom-up approach, which generates semantic interpretations of images starting from their low-level features. In these approaches BK is used only at a late stage for both enriching the semantic descriptions and improving image retrieval. In this paper, we show how BK plays an important role also during the early phase of SII. To this aim, we propose: (i) a reference framework where a semantic image description is a partial model of the BK. The elements of the partial model are grounded (linked) to a (set of) image segment(s). (ii) A loss function that evaluates how well this partial model fits the picture; (iii) a clustering-based optimization process that searches the partial model that better fits a picture. BK is used to prune branches of the search space that correspond to partial models which are inconsistent with BK. To evaluate our approach, we built a gold standard dataset of 203 pictures annotated with complex objects and their parts. We also evaluated our method on a reference dataset in Computer Vision, namely, the PASCAL-Part dataset. The results are positive. The evaluation assumes a perfect detection of parts. To understand the impact of a realistic (and noisy) part detection on our algorithm, we did a preliminary evaluation by implementing the entire SII pipeline. Part detection is performed by a recent deep learning architecture trained for detecting parts. From a qualitative analysis, it emerges that recognizing complex objects starting from parts in some cases gets better results than detecting complex objects directly.

Keywords

Information extraction computer vision semantic image interpretation ontologies clustering

Get full access to this article

View all access options for this article.

References

Atif

, Hudelot

, and Bloch

. Explanatory reasoning for image understanding using formal concept analysis and description logics. Systems, Man, and Cybernetics: Systems, IEEE Transactions on 44(5) (2014), 552–570.

Baader

, Calvanese

, McGuinness

D.L.

, Nardi

, and Patel-Schneider

P.F.

, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York, NY, USA, 2003.

Bannour

, and Hudelot

Towards ontologies for image interpretation and annotation. In Martinez

José M.

, editor, 9th International Workshop on Content-Based Multimedia Indexing, CBMI 2011, Madrid, Spain, 2011, pp. 211–216. IEEE, 2011.

Chen

, Zhou

Q.-Y.

, and Prasanna

Understanding web images by object relation network. In Proceedings of the 21st International Conference on World Wide Web, WWW’12, New York, NY, USA, 2012 pp. 291–300. ACM.

Chen

, Mottaghi

, Liu

, Fidler

, Urtasun

, and Yuille

. Detect what you can: Detecting and representing objects using holistic models and body parts, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

Cimiano

, Mädche

, Staab

, and Völker

Ontology learning. In Handbook on ontologies, Springer Berlin Heidelberg, 2009, pp. 245–267.

Dasiopoulou

, Kompatsiaris

, and Strintzis

M.G.

. Applying fuzzy dls in the extraction of image semantics, J Data Semantics 14 (2009), 105–132.

Deng

, Ding

, Jia

, Frome

, Murphy

, Bengio

, Li

, Neven

, and Adam

Large-scale object classification using label relation graphs. In Computer Vision–ECCV 2014, Springer, 2014, pp. 48–64.

Donadello

, Serafini

Mixing low-level and semantic features for image interpretation. In Agapito

Lourdes

, Bronstein

M. Michael

, and Rother

Carsten

, editors, Computer Vision - ECCV 2014Workshops, volume 8926 of Lecture Notes in Computer Science, Springer International Publishing, 2014, pp. 283–298 Best paper award.

10.

Espinosa

, Kaya

, and MÃűller

Logical formalization of multimedia interpretation. In Georgios

Paliouras

, Spyropoulos

Constantine D.

, Tsatsaronis

George

, editors, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, volume 6050 of Lecture Notes in Computer Science Springer Berlin Heidelberg, 2011, pp. 110–133.

11.

Everingham

, Eslami

S.M.A.

, Van Gool

Williams

C.K.I.

, Winn

, and Zisserman

. The pascal visual object classes challenge: A retrospective, International Journal of Computer Vision 111(1) (2015), 98–136.

12.

Fellbaum

, editor. WordNet: an electronic lexical database. MIT Press, 1998.

13.

Forestier

, Wemmert

, and Puissant

. Coastal image interpretation using background knowledge and semantics, Computers & Geosciences, 54 (2013), 88–96.

14.

Girshick

. Fast r-cnn, In International Conference on Computer Vision (ICCV), 2015.

15.

Gould

, Rodgers

, Cohen

, Elidan

, and Koller

. Multi-class segmentation with relative location prior, International Journal of Computer Vision 80(3) (2008), 300–316.

16.

Hirst

, and St-Onge

. Lexical chains as representations of context for the detection and correction of malapropisms, WordNet: An Electronic Lexical Database 305 (1998), 305–332.

17.

Hudelot

, Atif

, and Bloch

. Fuzzy spatial relation ontology for image interpretation, Fuzzy Sets and Systems 159(15) (2008), 1929–1951.

18.

Hudelot

, Maillot

, and Thonnat

. Symbol grounding for semantic image interpretation: From image data to semantics. In Proc. of the 10th IEEE Intl. Conf. on Computer Vision Workshops, ICCVW ’05. IEEE Computer Society, 2005.

19.

Jung

, Park

, Du

D.-Z.

, and Drake

B.L.

. A decision criterion for the optimal number of clusters in hierarchical clustering, Journal of Global Optimization 25(1) (2003), 91–111.

20.

Karpathy

, and Li

F.-F.

. Deep visual-semantic alignments for generating image descriptions. CoRR, abs/1412.2306, 2014.

21.

Kohonen

. The self-organizing map, Proc. of the IEEE 78(9) (1990), 1464–1480.

22.

Liu

, Zhang

, Lu

, and Ma

W.-Y.

. A survey of content-based image retrieval with high-level semantics, Pattern Recognition 40(1) (2007), 262–282.

23.

Long

, Shelhamer

, and Darrell

. Fully convolutional networks for semantic segmentation, arXiv preprint arXiv:1411.4038, 2014.

24.

Mahdisoltani

, Biega

, and Suchanek

F.M.

. Yago3: A knowledge base from multilingual wikipedias, In Proc. of the Conf on Innovative Data Systems Research, 2015.

25.

Marszalek

, and Schmid

. Semantic Hierarchies for Visual Object Recognition. In Computer Vision and Pattern Recognition, 2007.

26.

Neumann

, and Möller

. On scene interpretation with description logics, Image and Vision Computing, 26(1) (2008), 82–101. Cognitive Vision-Special Issue.

27.

Nyga

, Balint-Benczedi

, and Beetz

. Pr2 looking at thingsâĂ Ťensemble learning for unstructured information processing with markov logic networks, In Robotics and Automation (ICRA), 2014 IEEE International Conference on IEEE, 2014, pp. 3916–3923.

28.

Espinosa Peraldi

I.S.

, Kaya

, and Möller

, Formalizing multimedia interpretation based on abduction over description logic aboxes, In Proc. of the 22nd Intl. Workshop on Description Logics (DL 2009), volume 477 of CEUR Workshop Proceedings. CEUR-WS.org, 2009.

29.

Petrucci

. Information extraction for learning expressive ontologies, In The Semantic Web. Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Portoroz, Slovenia, May 31 - June 4, 2015. Proceedings, pages 740–750, 2015.

30.

Reiter

, and Mackworth

A.K.

. A logical framework for depiction and image interpretationm, Artificial Intelligence 41(2) (1989), 125–155.

31.

Russell

B.C.

, Torralba

, Murphy

K.P.

, and Freeman

W.T.

. Labelme: A database and web-based tool for image annotation, Int J Comput Vision 77(1–3) (2008), 157–173.

32.

Schroder

, and Neumann

. On the logics of image interpretation: model-construction in a formal knowledgerepresentation framework, In Image Processing, 1996. Proceedings, Int Conf on, 1 (1996), 785–788.

33.

Sirin

, Parsia

, Grau

B.C.

, Kalyanpur

, and Katz

. Pellet: A practical owl-dl reasoner, Web Semant 5(2) (2007), 51–53.

34.

Smith

, von Ehrenfels

, and Verlag

. Foundations of Gestalt theory. Philosophia Verlag Munich, Germany, 1988.

35.

Town

. Ontological inference for image and video analysis, Mach Vision Appl 17(2) (2006), 94–115.

36.

, Ba

, Kiros

, Cho

, Courville

A.C.

, Salakhutdinov

, Zemel

R.S.

, and Bengio

. Show, attend and tell: Neural image caption generation with visual attention. CoRR, abs/1502.03044, 2015.

37.

Yuille

, and Oliva

Frontiers in computer vision: Nsf white paper, November 2010. http://www.frontiersincomputervision.com/WhitePaperInvite.pdf

38.

Zhu

, Fathi

, and Fei-Fei

Reasoning about object affordances in a knowledge base representation. In Fleet

David

, Pajdla

Tomas

, Schiele

Bernt

, and Tuytelaars

Tinne

, editors, Computer Vision ĂŞ ECCV 2014, volume 8690 of Lecture Notes in Computer Science, Springer International Publishing, 2014, pp. 408–424.

Integration of numeric and symbolic information for semantic image interpretation

Abstract

Keywords

Get full access to this article

References