Sage Journals: Discover world-class research

Abstract

Non-convex optimization problems with multiple local optima are frequently encountered in machine learning. Graduated optimization algorithm (GOA) is a popular method for obtaining the global optima of non-convex problems by minimizing a sequence of locally strong convex functions that smooth the original non-convex problem with increasing approximation. Recently, GradOpt, a GOA-based algorithm, has demonstrated remarkable theoretical and experimental results. However, to optimize problems consisting of both convex and non-convex parts, GradOpt considers the entire objective function as a single non-convex function, leading to significant gaps between the smoothed and original functions. In this study, we propose two new algorithms: SVRG-GOA and PSVRG-GOA. They gradually smooth the non-convex part of the problem and then minimize the smoothed function using either the stochastic variance reduced gradient (SVRG) or proximal SVRG (Prox-SVRG) method. Both the algorithms are proven to have lower iteration complexity (O (1/ɛ)) than GradOpt (O (1/ɛ²)). Some tricks, such as larger shrinkage factor, projection step, stochastic gradient, and mini-batch skill are also applied to accelerate convergence of the proposed algorithms. Experimental results illustrate that two new algorithms with similar performance can converge to the "global" optima of a non-convex problem comparatively faster than GradOpt or non-convex Prox-SVRG.

Keywords

Non-convex optimization problem global optima stochastic variance reduced gradient graduated optimization

Get full access to this article

View all access options for this article.

References

Allen-Zhu

and Hazan

, Variance reduction for faster nonconvex optimization, In Proceedings of The 33rd International Conference on Machine Learning, 2016, pp. 699–707.

Ameli

, Alfi

and Aghaebrahimi

, A fuzzy discrete harmony search algorithm applied to annual cost reduction in radial distribution systems, Engineering Optimization 48(9) (2016), 1529–1549.

Arab

and Alfi

, An adaptive gradient descent-based local search in memetic algorithm applied to optimal controller design, Information Sciences 299 (2015), 117–142.

Bertsekas

D.P.

, Nonlinear programming, Athena scientific Belmont, 1999.

Blake

and Zisserman

, Visual reconstruction, MIT Press, 1987.

Chang

C.-C.

and Lin

C.-J.

, LIBSVM: A library for Support Vector Machines, 2011.

Chen

and Zhou

, Sparse algorithm for robust LSSVM in primal space. Neurocomputing, (2017), 10.1016/j.neucom.2017.10.011

Defazio

, Bach

and Lacoste-Julien

, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, In Advances in Neural Information Processing Systems, 2014, pp. 1646–1654.

Donoho

D.L.

, Compressed sensing, IEEE Transactions on Information Theory 52(4) (2006), 1289–1306.

10.

Duchi

J.C.

, Bartlett

P.L.

and Wainwright

M.J.

, Randomized smoothing for stochastic optimization, SIAM Journal on Optimization 22(2) (2012), 674–701.

11.

Feng

, Yang

, Huang

, Mehrkanoon

and Suykens

J.A.

, Robust support vector machines for classification with nonconvex and smooth losses, Neural Computation 28(6) (2016), 1217–1247.

12.

Garg

, Solving structural engineering design optimization problems using an artificial bee colony algorithm, Journal of Industrial and Management Optimization 10(3) (2014), 777–794.

13.

Garg

, A hybrid PSO-GA algorithm for constrained optimization problems, Applied Mathematics and Computation 274 (2016), 292–305.

14.

Guyon

, Sido: A phamacology dataset, 2008.

15.

Hazan

, Levy

K.Y.

and Shalev-Shwartz

, On graduated optimization for stochastic non-convex problems, In Proceedings of The 33rd International Conference on Machine Learning, 2016, pp. 1833–1841.

16.

Ivaz

and Beiranvand

, Inexact steepest descent algorithm for obtaining t-best approximation in a fuzzy normed space, Journal of Intelligent & Fuzzy Systems 30(5) (2016), 2999–3005.

17.

Johnson

and Zhang

, Accelerating stochastic gradient descent using predictive variance reduction, In Advances in Neural Information Processing Systems, 2013, pp. 315–323.

18.

Katsaggelos

A.K.

, Digital image restoration, Springer Publishing Company, Incorporated, 2012.

19.

Laporte

, Flamary

, Canu

, Déjean

and Mothe

, Nonconvex regularizations for feature selection in ranking with sparse svm, IEEE Transactions on Neural Networks and Learning Systems 25(6) (2014), 1118–1130.

20.

Lei

and Ming

, Research of direction-of-arrival estimation in fewer snapshots based on niche artificial bee colony algorithm, Journal of Intelligent & Fuzzy Systems 32(5) (2017), 3475–3485.

21.

and Lin

, Accelerated proximal gradient methods for nonconvex programming, In Advances in Neural Information Processing Systems, 2015, pp. 379–387.

22.

Liu

, Shi

, Tian

and Huang

, Ramp loss least squares support vector machine, Journal of Computational Science 14 (2016), 61–68.

23.

Montavon

, Braun

M.L.

and MÃller

K.-R.

, Kernel analysis of deep networks, Journal of Machine Learning Research 12(Sep) (2011), 2563–2581.

24.

Nitanda

, Stochastic proximal gradient descent with acceleration techniques, In Advances in Neural Information Processing Systems, 2014, pp. 1574–1582.

25.

Pahnehkolaei

S.M.A.

, Alfi

, Sadollah

and Kim

J.H.

, Gradient-based water cycle algorithm with evaporation rate applied to chaos suppression, Applied Soft Computing 53 (2017), 420–440.

26.

Pan

and Xu

, Two effective sample selection methods for support vector machine, Journal of Intelligent & Fuzzy Systems 30(2) (2016), 659–670.

27.

Powell

M.J.D.

, Restart procedures for the conjugate gradient method, Mathematical Programming 12(1) (1977), 241–254.

28.

Rakhlin

, Shamir

and Sridharan

, Making gradient descent optimal for strongly convex stochastic optimization. arXiv preprint arXiv:1109.5647, 2011.

29.

Reddi

S.J.

, Hefny

, Sra

, Poczos

and Smola

, Stochastic variance reduction for nonconvex optimization, In Proceedings of The 33rd International Conference on Machine Learning, 2016, pp. 314–323.

30.

Reddi

S.J.

, Sra

, Poczos

and Smola

A.J.

, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, In Advances in Neural Information Processing Systems, 2016, pp. 1145–1153.

31.

Schmidt

, Le Roux

and Bach

, Minimizing finite sums with the stochastic average gradient, Mathematical Programming 162(1-2) (2017), 83–112.

32.

Shalev-Shwartz

and Ben-David

, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014.

33.

Shalev-Shwartz

and Zhang

, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research 14(Feb) (2013), 567–599.

34.

Shalev-Shwartz

and Zhang

, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, In ICML, 2014, pp. 64–72.

35.

Sivaranjani

and Senthil

, Kumar, Hybrid particle swarm optimization-firefly algorithm (hpsoff) for combinatorial optimization of non-slicing vlsi floorplanning, Journal of Intelligent & Fuzzy Systems 32(1) (2017), 661–669.

36.

Wang

and Zhong

, Robust non-convex Least Squares loss function for regression with outliers, Knowledge-Based Systems 71 (2014), 290–302.

37.

and Liu

, Robust truncated hinge loss support vector machines, Journal of the American Statistical Association 102(479) (2007), 974–983.

38.

Xiao

and Zhang

, A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization 24(4) (2014), 2057–2075.

39.

Yang

, Tan

and He

, A robust Least Squares Support Vector Machines for regression and classification with noise, Neurocomputing 140 (2014), 41–52.

40.

Yuille

A.L.

and Rangarajan

, The concave-convex procedure, Neural Computation 15(4) (2003), 915–936.

41.

Zhang

, Solving large scale linear prediction problems using stochastic gradient descent algorithms, In Proceedings of the Twenty-First International Conference on Machine Learning, 2004, p. 116. ACM.

SVRG for a non-convex problem using graduated optimization algorithm

Abstract

Keywords

Get full access to this article

References