Sage Journals: Discover world-class research

Abstract

This work proposes a supervised layer-wise strategy to train deep convolutional neural networks (DCNs) particularly suited for small, specialized image datasets. DCNs are increasingly being used with considerable success in image classification tasks and trained over large datasets (with more than 1M images and 10 K classes). Pre-trained successful DCNs can then be used for new smaller datasets (10 K to 100 K images) through a transfer learning process which cannot guarantee competitive a-priori performance if the new data is of different or specialized nature (medical imaging, plant recognition, etc.). We therefore seek out to find competitive techniques to train DCNs for such small datasets, and hereby describe a supervised greedy layer-wise method analogous to that used in unsupervised deep networks. Our method consistently outperforms the traditional methods that train a full DCN architecture in a single stage, yielding an average of over 20% increase in classification performance across all DCN architectures and datasets used in this work. Furthermore, we obtain more interpretable and cleaner visual features. Our method is better suited for small, specialized datasets since we require a training cycle for each DCN layer and this increases its computing time almost linearly with the number of layers. Nevertheless, it still remains as a fraction of the computing time required to generate pre-trained models with large generic datasets, and poses no additional requirements on hardware. This constitutes a solid alternative for training DCNs when transfer learning is not possible and, furthermore, suggests that state of the art DCN performance with large datasets might yet be improved at the expense of a higher computing time.

Keywords

Convolutional networks deep learning greedy layer-wise training

Get full access to this article

View all access options for this article.

References

Bengio

, Courville

and Vincent

, Representation learning: A review and new perspectives, Pattern Analysis and Machine Intelligence, IEEE Transactions on35(8) (2013), 1798–1828.

Bengio

, Lamblin

, Popovici

, Larochelle

, et al., Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems19 (2007), 153.

Bossard

, Guillaumin

and Gool

L.V.

, Food-101 – mining discriminative components with random forests, In European Conference on Computer Vision, 2014.

Erhan

, Bengio

, Courville

, Manzagol

P.-A.

, Vincent

and Bengio

, Why does unsupervised pre-training help deep learning?The Journal of Machine Learning Research11 (2010), 625–660.

Fukushima

, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological cybernetics36(4) (1980), 193–202.

, Zhang

, Ren

and Sun

, Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

Hinton

G.E.

, A practical guide to training restricted boltzmann machines, In Neural Networks: Tricks of the Trade, Springer, 2012, pp. 599–619.

Hinton

G.E.

, Osindero

and Teh

Y.-W.

, A fast learning algorithm for deep belief nets, Neural computation18(7) (2006), 1527–1554.

Hochreiter

and Schmidhuber

, Long short-term memory, Neural computation9(8) (1997), 1735–1780.

10.

Jacobsen

J.-H.

, Gemert

J.V.

, Lou

and Smeulders

A.W.M.

, Structured receptive fields in cnns. arXiv preprint arXiv:1605.02971, 2016.

11.

Jia

, Shelhamer

, Donahue

, Karayev

, Long

, Girshick

, Guadarrama

and Darrell

, Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

12.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.

13.

Larochelle

, Bengio

, Louradour

and Lamblin

, Exploring strategies for training deep neural networks, The Journal of Machine Learning Research10 (2009), 1–40.

14.

LeCun

and Bengio

, Convolutional networks for images, speech, and time series, The Handbook of Brain Theory and Neural Networks3361(10) (1995), 1995.

15.

LeCun

, Bottou

, Bengio

and Haffner

, Gradient-based learning applied to document recognition, Proceedings of the IEEE86(11) (1998), 2278–2324.

16.

Nilsback

M.-E.

and Zisserman

, Automated flower classification over a large number of classes. In Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, 2008.

17.

Penatti

, Nogueira

and Santos

, Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 44–51.

18.

Rueda-Plata

, Ramos-Pollán

and González

F.A.

, Supervised greedy layer-wise training for deep convolutional networks with small datasets. In Computational Collective Intelligence, Springer, 2015, pp. 275–284.

19.

Schmidhuber

, Deep learning in neural networks: An overview, Neural Networks61 (2015), 85–117.

20.

Vincent

, Larochelle

, Lajoie

, Bengio

and Manzagol

P.-A.

, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, The Journal of Machine Learning Research11 (2010), 3371–3408.

21.

Yang

and Newsam

, Bag-of-visual-words and spatial extensions for land-use classification, In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2010, pp. 270–279.

Effective training of convolutional neural networks with small,specialized datasets

Abstract

Keywords

Get full access to this article

References