Sage Journals: Discover world-class research

Abstract

Image segmentation, which becomes more and more prevalent in computer vision, plays a requisite part in the fields of object detection, tracking and even virtual or augmented reality. Early segmentation methods that relied on hand-crafted features have fast been superseded by deep learning algorithms. Nonetheless, deep learning algorithms are hardly applied in real object segmentation because of a lack of ground truth labels. This work introduces the use of 3D models to generate segmentation training dataset. This system projects 3D models to the 2D plane and merges 2D images with different backgrounds to obtain training images. In this process, the ground truth labels would be allowed to obtain automatically without manual annotation, since the position of objects is known in the picture. Experimental results indicate that synthetic images can be used to train on existed networks such as FCNs and DeepLab and trained models achieve relatively accurate segmentation results on real images. Moreover, the modified model based on DeepLab-CRF-LargeFOV achieves more precise segmentation results by strengthening its localization and edge performance.

Keywords

Object segmentation 3D models synthetic images fully convolutional networks

Get full access to this article

View all access options for this article.

References

Bearman

, Russakovsky

, Ferrari

and Li

F.F.

, Whats the point: Semantic segmentation with point supervision, In European Conference on Computer Vision (2016), pp. 549–565.

Torralba

, Russell

B.C.

and Yuen

, Labelme: Online image annotation and applications, Proceedings of the IEEE, 98(8) (2010), 1467–1484.

Fulkerson

, Vedaldi

and Soatto

, Class segmentation and object localization with superpixel neighborhoods, (2009), pp. 670–677.

Hariharan

, Arbelaez

, Bourdev

L.D.

, Maji

and Malik

, Semantic contours from inverse detectors, (2011), pp. 991–998.

Phong

B.T.

, Illumination for computer generated pictures, Communications of the Acm, 18(6) (1975), 311–317.

Rother

, Kolmogorov

and Blake

, "Grabcut": Interactive foreground extraction using iterated graph cuts, In ACM SIGGRAPH (2004), 309–314.

, Visualsfm: A visual structure from motion system, http://ccwu.me/vsfm/ 2013.

Szegedy

, Liu

, Jia

, Sermanet

, Reed

S.E.

, Anguelov

, Erhan

, Vanhoucke

and Rabinovich

, Going deeper with convolutions, Computer Vision and Pattern Recognition (2015), 1–9.

Kuettel

, Guillaumin

and Ferrari

, Segmentation propagation in imagenet, In European Conference on Computer Vision 2012, 459–473.

10.

Lin

, Dai

, Jia

, He

and Sun

, Scribblesup: Scribblesupervised convolutional networks for semantic segmentation, (2016), pp. 3159–3167.

11.

Ros

, Sellart

, Materzynska

, Vazquez

and Lopez

A.M.

, The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes, (2016), pp. 3234–3243.

12.

Rahmani

and Mian

, 3d action recognition from novel viewpoints, In Computer Vision and Pattern Recognition (2016), 1506–1515.

13.

Long

, Shelhamer

and Darrell

, Fully convolutional networks for semantic segmentation, Computer Vision and Pattern Recognition 79(10) (2015), 3431–3440.

14.

Papon

and Schoeler

, Semantic pose using deep networks trained on synthetic rgb-d, 8(8) (2015), 774–782.

15.

Shotton

, Winn

, Rother

and Criminisi

, Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context, International Journal of Computer Vision, 81(1) (2009), 2–23.

16.

Shotton

, Johnson

and Cipolla

, Semantic texton forests for image categorization and segmentation, (2008), pp. 1–8.

17.

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition, Computer Science (2014).

18.

Yamaguchi

, Kiapour

M.H.

, Ortiz

L.E.

and Berg

T.L.

, Parsing clothing in fashion photographs, In IEEE Conference on Computer Vision and Pattern Recognition (2012), 3570–3577.

19.

Castrejon

, Kundu

, Urtasun

and Fidler

, Annotating object instances with a polygon-rnn, arXiv preprint arXiv:1704.05548, 2017.

20.

Chen

, Papandreou

, Kokkinos

, Murphy

and Yuille

A.L.

, Semantic image segmentation with deep convolutional nets and fully connected crfs, International Conference on Learning Tepresentations,2015.

21.

Chen

, Papandreou

, Kokkinos

, Murphy

and Yuille

A.L.

, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence,2017.

22.

Chen

, Papandreou

, Schroff

and Adam

, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587, 2017.

23.

Vincent

and Soille

, Watersheds in digital spaces: An efficient algorithm based on immersion simulations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6) (1991), pp. 583–598.

24.

Cordts

, Omran

, Ramos

, Scharwa

, ÌĹchter , En-zweiler

, Benenson

, Franke

, Roth

, and Schiele

, The cityscapes dataset, In CVPR Workshop on the Future of Datasets in Vision 2015,.

25.

Everingham

, Eslami

S.M.

, Van Gool

, Williams

C.K.I.

, Winn

J.M.

, and Zisserman

, The pascal visual object classes challenge: A retrospective, International Journal of Computer Vision, 111(1) (2015), 98–136.

26.

Kass

, Witkin

A.P.

and Terzopoulos

, Snakes: Active contour models, International Journal of Computer Vision, 1(4) (1988), 321–331.

27.

Rajchl

, Lee

, Oktay

, Kamnitsas

, Passerat-Palmbach

, Bai

, Rutherford

, Hajnal

, Kainz

, and Rueckert

, Deepcut: Object segmentation from bounding box annotations using convolutional neural networks, IEEE Transactions on Medical Imaging, 2016.

28.

Cignoni

, Callieri

, Corsini

, Dellepiane

, Ganovelli

and Ranzuglia

, MeshLab: An Open-Source Mesh Processing Tool, In Scarano

, Chiara

R.D.

, and Erra

, editors Eurographics Italian Chapter Conference, The Eurographics Association 2008.

29.

Burt

P.J.

and Adelson

E.H.

, The laplacian pyramid as a compact image code, IEEE Transactions on Communications, 31(4) (1983), 532–540.

30.

Krhenbhl

and Koltun

, Efficient inference in fully connected crfs with gaussian edge potentials, (2011), pp. 109–117.

31.

Rajpura

P.S.

, Hegde

R.S.

and Bojinov

, Object detection using deep cnns trained on synthetic images, 2017.

32.

Zheng

, Jayasumana

, Romeraparedes

, Vineet

, Su

, Du

, Huang

and Torr

P.H.S.

, Conditional random fields as recurrent neural networks, In IEEE International Conference on Computer Vision (2015), pp. 1529–1537.

33.

Zhou

, N.Wu

, Wu

and Zhou

, Exploiting local structures with the kronecker layer in convolutional networks, arXiv preprint arXiv: Computer Vision and Pattern Recognition, 2015.

34.

Smith , Ray

, Blinn , and James

, Blue screen matting, Acm Siggraph Computer Graphics (1996), 259–268.

35.

Lin

, Maire

, Belongie

S.J.

, Hays

, Perona

, Ramanan

, Dollar

and Zitnick

C.L.

, Microsoft coco: Common objects in context, European Conference on Computer Vision (2014), pp. 740–755.

36.

Badrinarayanan

, Kendall

and Cipolla

, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

37.

LeCun

, Boser

, Denker

J.S.

, Henderson

, Howard

R.E.

, Hubbard

and Jackel

L.D.

, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1(4) (1989), 541–551.

38.

Jia

, Shelhamer

, Donahue

, Karayev

, Long

, Girshick

R.B.

, Guadarrama

and Darrell

, Caffe: Convolutional architecture for fast feature embedding, Acm Multimedia (2014), 675–678.

Object segmentation using FCNs trained on synthetic images

Abstract

Keywords

Get full access to this article

References