Optimizing deep neural networks hyperparameter positions and values

Abstract

Hyperparameter optimization is a crucial step in the implementation of any machine learning model. This optimization process includes regularly modifying the hyperparameter values of the model in order to minimize the testing error. A deep neural learning model hyperparameter optimization process includes optimizing both the model parameters and architecture. Optimizing a model’s parameters involves deciding the values of parameters, such as learning rate and batch size. Optimizing architectural hyperparameters includes deciding the shape of the deep neural learning model, i.e., the number of layers of individual types and the number of neurons in a certain layer. The state-of-the-art hyperparameter optimization methods don’t optimize the position of the hyperparameter within the model architecture. In this work, we study the effect of changing a hyperparameter within the deep learning model architecture. Thus, we propose an architectural position optimization (ArchPosOpt) method for model architectural hyperparameter optimization. ArchPosOpt extends three different hyperparameter optimization techniques, namely grid search, random search, and Tree-structured Parzen Estimator (TPE), to include a new dimension of hyperparameter optimization problem – the hyperparameter position. We show through a set of experiments that the position of the hyperparameters does matter for model performance as well as the hyperparameter values.

Keywords

Deep neural networks hyperparameter optimization CNN architectural optimization hyperparameter position

Get full access to this article

View all access options for this article.

References

Dogs vs. Cats. https://www.kaggle.com/c/dogs-vs-cats.

Baker

, Gupta

, Naik

and Raskar

, Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016.

Bergstra

and Bengio

, Random search for hyperparameter optimization, Journal of Machine Learning Research 13(Feb) (2012), 281–305.

Bergstra

, Yamins

and Cox

D.D.

, Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms, In Proceedings of the 12th Python in Science Conference, Citeseer, 2013, pp. 13–20.

Bergstra

, Yamins

and Cox

D.D.

, Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures, 2013.

Bergstra

J.S.

, Bardenet

, Bengio

and Kégl

, Algorithms for hyper-parameter optimization, In Advances in Neural Information Processing Systems, 2011, pp. 2546–2554.

Boureau

Y.-L.

, Ponce

and Cun

Y.L.

, A theoretical analysis of feature pooling in visual recognition, In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010), pp. 111–118.

Claesen

, Simm

, Popovic

and Moor

B.D.

, Hyperparameter tuning in python using optunity, In Proceedings of the InternationalWorkshop on Technical Computing for Machine Learning and Mathematical Engineering, volume 1, 2014, p. 3.

Duan

, Li

and Li

, An ensemble cnn2elm for age estimation, IEEE Transactions on Information Forensics and Security 13(3) (2018), 758–772.

10.

Duan

, Li

, Yang

and Li

, A hybrid deep learning cnn–elm for age and gender classification, Neurocomputing 275 (2018), 448–461.

11.

Ducha-Aiki. ducha-aiki/caffenet-benchmark.

12.

Falkner

, Klein

and Hutter

, Bohb: Robust and efficient hyperparameter optimization at scale. ArXiv preprint arXiv:1807.01774, 2018.

13.

Feurer

, Klein

, Eggensperger

, Springenberg

, Blum

and Hutter

, Efficient and robust automated machine learning, Advances in Neural Information Processing Systems, 2015, pp. 2962–2970.

14.

Feurer

, Springenberg

J.T.

and Hutter

, Initializing bayesian hyperparameter optimization via metalearning, In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

15.

Goyal

, Dollár

, Girshick

, Noordhuis

, Wesolowski

, Kyrola

, Tulloch

, Jia

and He

, Accurate, large minibatch sgd: Training imagenet in 1 hour, arXiv preprint arXiv:1706.02677, 2017.

16.

Han

and Kamdar

M.R.

, Mri to mgmt: Predicting methylation status in glioblastoma patients using convolutional recurrent neural networks, In Pac Symp Biocomput, World Scientific, volume 23, 2018, pp. 331–42.

17.

, Zhang

, Ren

and Sun

, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, In Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.

18.

, Zhang

, Ren

and Sun

, Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

19.

Henderson

, Islam

, Bachman

, Pineau

, Precup

and Meger

, Deep reinforcement learning that matters, In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

20.

Hubel

D.H.

and Wiesel

T.N.

, Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex, The Journal of Physiology 160(1) (1962), 106–154.

21.

Huval

, Wang

, Tandon

, Kiske

, Song

, Pazhayampallil

, Andriluka

, Rajpurkar

, Migimatsu

and Cheng-Yue

, et al., An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716, 2015.

22.

Ioffe

and Szegedy

, Batch normalization: Accelerating deep network training by reducing internal covariate shift, In International Conference on Machine Learning, 2015, pp. 448–456.

23.

Jaafra

, Laurent

J.L.

, Deruyver

and Naceur

M.S.

, A review of meta-reinforcement learning for deep neural networks architecture search. arXiv preprint arXiv:1812.07995, 2018.

24.

Krizhevsky

and Hinton

, Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

25.

Cun

Y.L.

, Bottou

, Bengio

and Haffner

, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11) (1998), 2278–2324.

26.

Maclaurin

, Duvenaud

and Adams

, Gradient-based hyperparameter optimization through reversible learning, In International Conference on Machine Learning, 2015, pp. 2113–2122.

27.

Melis

, Dyer

and Blunsom

, On the state of the art of evaluation in neural language models, arXiv preprint arXiv:1707.05589, 2017.

28.

Mishkin

, Sergievskiy

and Matas

, Systematic evaluation of convolution neural network advances on the imagenet, Computer Vision and Image Understanding 161 (2017), 11–19.

29.

Nwankpa

, Ijomah

, Gachagan

and Marshall

, Activation functions: Comparison of trends in practice and research for deep learning, arXiv preprint arXiv:1811.03378, 2018.

30.

Pedregosa

, Varoquaux

, Gramfort

, Michel

, Thirion

, Grisel

, Blondel

, Prettenhofer

, Weiss

and Dubourg

, et al., Scikit-learn: Machine learning in python, Journal of Machine Learning Research 12(Oct) (2011), 2825–2830.

31.

Pham

, Guan

M.Y.

, Zoph

, Le

Q.V.

and Dean

, Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018.

32.

Ravì

, Wong

, Deligianni

, Berthelot

, Andreu-Perez

, Lo

and Yang

G.-Z.

, Deep learning for health informatics, IEEE Journal of Biomedical and Health Informatics 21(1) (2017), 4–21.

33.

Scherer

, Müller

and Behnke

, Evaluation of pooling operations in convolutional architectures for object recognition, In Artificial Neural Networks–ICANN 2010, Springer, 2010, pp. 92–101.

34.

Smith

S.L.

, Kindermans

P.-J.

, Ying

and Le

Q.V.

, Don’t decay the learning rate, increase the batch size, arXiv preprint arXiv:1711.00489, 2017.

35.

Srivastava

, Hinton

, Krizhevsky

, Sutskever

and Salakhutdinov

, Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research 15(1) (2014), 1929–1958.

36.

Suganuma

, Shirakawa

and Nagao

, A genetic programming approach to designing convolutional neural network architectures, In Proceedings of the Genetic and Evolutionary Computation Conference, ACM, 2017, pp. 497–504.

37.

, Wang

, Chen

and Li

, Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.

38.

Yamasaki

, Honma

and Aizawa

, Efficient optimization of convolutional neural networks using particle swarm optimization, In 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), 2017, pp. 70–73. IEEE.