Abstract
This work proposes a supervised layer-wise strategy to train deep convolutional neural networks (DCNs) particularly suited for small, specialized image datasets. DCNs are increasingly being used with considerable success in image classification tasks and trained over large datasets (with more than 1M images and 10 K classes). Pre-trained successful DCNs can then be used for new smaller datasets (10 K to 100 K images) through a transfer learning process which cannot guarantee competitive a-priori performance if the new data is of different or specialized nature (medical imaging, plant recognition, etc.). We therefore seek out to find competitive techniques to train DCNs for such small datasets, and hereby describe a supervised greedy layer-wise method analogous to that used in unsupervised deep networks. Our method consistently outperforms the traditional methods that train a full DCN architecture in a single stage, yielding an average of over 20% increase in classification performance across all DCN architectures and datasets used in this work. Furthermore, we obtain more interpretable and cleaner visual features. Our method is better suited for small, specialized datasets since we require a training cycle for each DCN layer and this increases its computing time almost linearly with the number of layers. Nevertheless, it still remains as a fraction of the computing time required to generate pre-trained models with large generic datasets, and poses no additional requirements on hardware. This constitutes a solid alternative for training DCNs when transfer learning is not possible and, furthermore, suggests that state of the art DCN performance with large datasets might yet be improved at the expense of a higher computing time.
Get full access to this article
View all access options for this article.
