Sage Journals: Discover world-class research

Abstract

The spectral conjugate gradient algorithm, which is a variant of conjugate gradient method, is one of the effective methods for solving unconstrained optimization problems. In this paper, based on Hestenes–Stiefel method, two new spectral conjugate gradient algorithms (Descend Hestenes-Stiefel (DHS) and Wang-Hestenes-Stiefel (WHS)) are proposed. Under Wolfe line search and mild assumptions on objective function, the two algorithms possess sufficient descent property without any other conditions and are always globally convergent. Numerical results turn out the new algorithms outperform Hestenes–Stiefel conjugate gradient method.

Keywords

Spectral conjugate gradient algorithm unconstrained optimization Wolfe line search sufficient descent property globally convergent

Introduction

Consider the unconstrained optimization problem (UP)

\begin{matrix} min f (x), x \in R^{n} \end{matrix}

(1)

where the function

f : R^{n} \to R^{1}

is continuously differentiable.

The most commonly used method for solving this kind of problem is the conjugate gradient (CG) method, which is especially suitable for solving large dimension or non-linear problems. Its convergence rate is between Newton method and steepest descent method, CG method avoids the shortcomings of the Newton method to calculate the Hessen matrix, and also has a secondary termination. Its main iterative format is

\begin{matrix} x_{k + 1} = x_{k} + α_{k} d_{k} \end{matrix}

(2)

\begin{matrix} d_{k} = {\begin{matrix} - g_{k}, & k = 0; \\ - g_{k} + β_{k} d_{k - 1}, & k \geq 1 \end{matrix} \end{matrix}

(3)

where

α_{k}

is the step factor, which can be determined by some methods (line search, etc.), d_k is the down search direction,

β_{k}

is a scalar. Different CG methods are generated according to the different formulae of scalar parameters

β_{k}

, and different spectral CG methods are generated according to different search directions d_k.¹

The $β_{k}$ expressions used in some well know CG algorithms are list below

\begin{matrix} β_{k}^{HS} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{d_{k - 1}^{T} (g_{k} - g_{k - 1})} HS (Hestenes - Stiefel) \\ algorithm,^{2} \\ β_{k}^{FR} = \frac{g_{k}^{T} g_{k}}{g_{k - 1}^{T} g_{k - 1}} FR (Fletcher - Reeves) algorithm,^{3} \\ β_{k}^{PRP} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{g_{k - 1}^{T} g_{k - 1}} \end{matrix}

PRP (Polak–Ribière–Polyak) algorithm,^4,5

\begin{matrix} β_{k}^{DY} = \frac{g_{k}^{T} g_{k}}{d_{k - 1}^{T} (g_{k} - g_{k - 1})} DY (Dai - Yuan) algorithm .^{6} \end{matrix}

Among these four algorithms, FR and DY algorithm have good global convergence, while Hestenes–Stiefel (HS) and PRP algorithm have fantastic numerical performance. HS algorithm for strict convex quadratic function has finite step convergence under exact line search, but for general non strict convex quadratic objective function, even under exact line search can't guarantee convergent in finite steps, and global convergence cannot be guaranteed.⁷

Combining with the advantage of HS and DY algorithm, references⁸ proposed a new conjugate method

\begin{matrix} β_{k}^{NLS} = {\begin{matrix} \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{(g_{k}^{T} d_{k - 1})^{2} - d_{k - 1}^{T} g_{k - 1}}, & (1 - cos θ) | | g_{k} | |^{2} > | g_{k}^{T} g_{k - 1} |; \\ max {\frac{| | g_{k} | |^{2}}{d_{k - 1}^{T} y_{k - 1}}, 0}, & otherwise, \end{matrix} \\ d_{k} = {\begin{matrix} - g_{k,} & k = 1; \\ - (1 + β_{k} \frac{g_{k}^{T} d_{k - 1}}{| | g_{k} | |^{2}}) g_{k} + β_{k} d_{k - 1}, & k > 1 \end{matrix} \\ NLS - DYalgorithm .^{8} \end{matrix}

The motivation of this paper is to combine the advantages of HS³ and NLS-DY⁸ in order to provide novel algorithms with better convergence.

The new algorithms

Consider the unconstrained optimization problem (1), combining with the literature.^3,8 The formulae of DHS and WHS are constructed as follows

\begin{matrix} β_{k}^{WHS} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{μ (g_{k}^{T} d_{k - 1})^{2} - d_{k - 1}^{T} g_{k - 1}} \end{matrix}

(4)

\begin{matrix} d_{k} = {\begin{matrix} - g_{k,} & k = 1; \\ - (1 + β_{k} \frac{g_{k}^{T} d_{k - 1}}{| | g_{k} | |^{2}}) g_{k} + β_{k} d_{k - 1}, & k > 1 \end{matrix} \end{matrix}

(5)

Compared with HS algorithm, DHS algorithm’s innovation lies in d_k. In the HS algorithm, iteration format is (2), search direction is (3), and search method is line search. However, in the DHS algorithm, iteration format is (2), search direction is (5), and search method is Wolfe line search. Under the same scalar parameter $β_{k}^{HS}$ , using different search direction and search method, the DHS algorithm has better numerical results under the premise of convergence.

WHS algorithm also uses search direction (5) and Wolfe line search, the scalar parameter $β_{k}^{NLS}$ in Shi et al.⁸ is modified to get the scalar parameter $β_{k}^{WHS} .$ The parameter μ in $β_{k}^{WHS}$ is a constant and $0 \leq μ \leq 1$ , different parameters μ have different iterative effects.

DHS algorithm implementation process:

Given a initial value $x_{1} \in R^{n}, ɛ > 0, d_{1} = - g_{1}, k = 1 .$

Perform the Wolfe line search

{\begin{matrix} f (x_{k} + α_{k} d_{k}) \leq f (x_{k}) + δ α_{k} g_{k}^{T} d_{k}; \\ d_{k}^{T} g (x_{k} + α_{k} d_{k}) \geq σ d_{k}^{T} g_{k} \end{matrix}

(6)

where

0 < δ < σ < \frac{1}{4},

and are real numbers. From (6), we get

α_{k},

and according to (2), we obtain

x_{k + 1} .

Then calculate

f_{k + 1} = f (x_{k + 1}), g_{k + 1} = g (x_{k + 1}) .

If $| | g_{k + 1} | | < ɛ,$ the minimum value is $x_{k + 1};$ but if $| | g_{k + 1} | | \geq ɛ,$ go to the next step.

Calculate formula $β_{k}^{HS}$ and formula d_k from (5).

Put $k = k + 1,$ and turn to 2.

WHS algorithm implementation process:

Given a initial value $x_{1} \in R^{n}, ɛ > 0, d_{1} = - g_{1}, k = 1 .$

Perform the Wolfe line search (6) $0 < δ < σ < \frac{1}{3},$ we get $α_{k},$ and according to (2), we obtain $x_{k + 1} .$ Then calculate $f_{k + 1} = f (x_{k + 1}), g_{k + 1} = g (x_{k + 1}) .$

If $| | g_{k + 1} | | < ɛ,$ the minimum value is $x_{k + 1};$ but if $| | g_{k + 1} | | \geq ɛ,$ go to the next step.

Calculate formula (4) and (5).

Put $k = k + 1,$ and turn to 2.

Global convergence

Assumptions:⁹

The level set $Ω = {x \in R^{n} | f (x) \leq f (x_{0})}$ is bounded, where $x_{0}$ is the initial point.

The function f is continuously differentiable in a neighborhood Φ of $Ω,$ and the function gradient satisfies the Lipschitz continuity condition, that is, there exists a positive constant L such that the following holds

| | g (x_{1}) - g (x_{2}) | | < L | | x_{1} - x_{2} | |, \forall x_{1}, x_{2} \in Φ

Theorem 2.1

If $g_{k} \neq 0,$ the directions generated by DHS and WHS algorithm are descendant, that is, $\forall k \geq 1, g_{k}^{T} d_{k} < 0 .$

Proof

While $k = 1, d_{k} = - g_{k}, g_{k}^{T} d_{k} = - | | g_{k} | |^{2};$ while $k > 1, d_{k} = - (1 + β_{k} \frac{g_{k}^{T} d_{k - 1}}{| | g_{k} | |^{2}}) g_{k} + β_{k} d_{k - 1},$

g_{k}^{T} d_{k} = - (1 + β_{k} \frac{g_{k}^{T} d_{k - 1}}{| | g_{k} | |^{2}}) | | g_{k} | |^{2} + β_{k} g_{k}^{T} d_{k - 1} = - | | g_{k} | |^{2}

So it is easy to verify $β_{k}^{HS}$ and $β_{k}^{WHS}$ satisfy sufficient descent property.

Lemma 2.1

Suppose that f satisfies the above premises, x_k from (2), d_k from (5), $α_{k}$ satisfies the Wolfe line search (6).¹⁰ Then the Zoutendijk holds

\begin{matrix} \sum_{k = 1}^{\infty} \frac{(g_{k}^{T} d_{k})^{2}}{| | d_{k} | |^{2}} < + \infty \end{matrix}

(7)

Theorem 2.2

Assuming that assumptions (1) and (2) is established, consider the CG method with the form of (2) and (5), and $β_{k} = β_{k}^{HS},$ the following holds

{lim}_{k \to \infty} inf | | g_{k} | | = 0

(8)

Proof

(Reduction to absurdity) first of all, assume that the conclusion is not established, then $\forall k > 0, \exists ɛ > 0$ is a real constant, and $| | g_{k} | | > ɛ$ holds.

According to (5),

g_{k}^{T} d_{k} = - | | g_{k} | |^{2}

and let

l_{k} = 1 + β_{k} \frac{g_{k}^{T} d_{k - 1}}{| | g_{k} | |^{2}},

while

k > 1,

(5) is squared, taken norm, and simplified as follow

| | d_{k} | |^{2} = (β_{k})^{2} | | d_{k - 1} | |^{2} - 2 l_{k} d_{k}^{T} g_{k} - l_{k}^{2} | | g_{k} | |^{2},

and

β_{k} = β_{k}^{HS},

then we have

\begin{matrix} \frac{| | d_{k} | |^{2}}{(g_{k}^{T} d_{k})^{2}} = (β_{k}^{HS})^{2} \frac{| | d_{k - 1} | |^{2}}{(g_{k}^{T} d_{k})^{2}} - \frac{2 l_{k} d_{k}^{T} g_{k}}{(g_{k}^{T} d_{k})^{2}} - \frac{l_{k}^{2} | | g_{k} | |^{2}}{(g_{k}^{T} d_{k})^{2}} \\ \leq (β_{k}^{HS})^{2} \frac{| | d_{k - 1} | |^{2}}{| | g_{k} | |^{4}} - \frac{(l_{k} - 1)^{2}}{| | g_{k} | |^{2}} + \frac{1}{| | g_{k} | |^{2}} \\ = \frac{[g_{k}^{T} (g_{k} - g_{k - 1})]^{2}}{[d_{k - 1}^{T} (g_{k} - g_{k - 1})]^{2}} \frac{| | d_{k - 1} | |^{2}}{| | g_{k} | |^{4}} - \frac{(l_{k} - 1)^{2}}{| | g_{k} | |^{2}} + \frac{1}{| | g_{k} | |^{2}} \\ \leq \frac{| | g_{k} | |^{4}}{(g_{k - 1}^{T} d_{k - 1})^{2}} \frac{| | d_{k - 1} | |^{2}}{| | g_{k} | |^{4}} + \frac{1}{| | g_{k} | |^{2}} \\ = \frac{| | d_{k - 1} | |^{2}}{(g_{k - 1}^{T} d_{k - 1})^{2}} + \frac{1}{| | g_{k} | |^{2}} \end{matrix}

we can delivery as follows

\begin{matrix} \frac{| | d_{k} | |^{2}}{(g_{k}^{T} d_{k})^{2}} \leq \sum_{i = 1}^{k} \frac{1}{| | g_{i} | |^{2}} \leq \sum_{i = 1}^{k} \frac{1}{ɛ^{2}} = \frac{k}{ɛ^{2}}, {\frac{(g_{k}^{T} d_{k})}{| | d_{k} | |^{2}}}^{2} \geq \frac{ɛ^{2}}{k}, \\ \sum_{k = 1}^{\infty} \frac{(g_{k}^{T} d_{k})^{2}}{| | d_{k} | |^{2}} \geq \sum_{k = 1}^{\infty} \frac{ɛ^{2}}{k} = ɛ^{2} \sum_{k = 1}^{\infty} \frac{1}{k} = + \infty \end{matrix}

it is contrary to condition (7) of Lemma 2.1. ((7) holds), then Theorem 2.2. holds, DHS algorithm has global convergence.

Theorem 2.3

Assuming that assumptions (1) and (2) is established, consider the CG method with the form of (2) and (5), and $β_{k} = β_{k}^{WHS},$ the following holds

\begin{matrix} {lim}_{k \to \infty} inf | | g_{k} | | = 0 \end{matrix}

(9)

Proof

(Reduction to absurdity) first of all, assume that the conclusion is not established, then $\forall k > 0, \exists ɛ > 0$ is a real constant, and $| | g_{k} | | > ɛ$ holds.

According to (5),

g_{k}^{T} d_{k} = - | | g_{k} | |^{2}

and let

l_{k} = 1 + β_{k} \frac{g_{k}^{T} d_{k - 1}}{| | g_{k} | |^{2}},

while

k > 1,

(4) is squared, taken norm, and simplified as follow

| | d_{k} | |^{2} = (β_{k})^{2} | | d_{k - 1} | |^{2} - 2 l_{k} d_{k}^{T} g_{k} - l_{k}^{2} | | g_{k} | |^{2},

and

β_{k} = β_{k}^{WHS},

then we have

\begin{matrix} \frac{| | d_{k} | |^{2}}{(g_{k}^{T} d_{k})^{2}} = (β_{k}^{WHS})^{2} \frac{| | d_{k - 1} | |^{2}}{(g_{k}^{T} d_{k})^{2}} - \frac{2 l_{k} d_{k}^{T} g_{k}}{(g_{k}^{T} d_{k})^{2}} - \frac{l_{k}^{2} | | g_{k} | |^{2}}{(g_{k}^{T} d_{k})^{2}} \\ \leq (β_{k}^{WHS})^{2} \frac{| | d_{k - 1} | |^{2}}{| | g_{k} | |^{4}} - \frac{(l_{k} - 1)^{2}}{| | g_{k} | |^{2}} + \frac{1}{| | g_{k} | |^{2}} \\ = \frac{[g_{k}^{T} (g_{k} - g_{k - 1})]^{2}}{[μ (g_{k}^{T} d_{k - 1})^{2} - d_{k - 1}^{T} g_{k - 1}]^{2}} \frac{| | d_{k - 1} | |^{2}}{| | g_{k} | |^{4}} \\ - \frac{(l_{k} - 1)^{2}}{| | g_{k} | |^{2}} + \frac{1}{| | g_{k} | |^{2}} \\ \leq \frac{| | g_{k} | |^{4}}{(g_{k - 1}^{T} d_{k - 1})^{2}} \frac{| | d_{k - 1} | |^{2}}{| | g_{k} | |^{4}} + \frac{1}{| | g_{k} | |^{2}} \\ = \frac{| | d_{k - 1} | |^{2}}{(g_{k - 1}^{T} d_{k - 1})^{2}} + \frac{1}{| | g_{k} | |^{2}} \end{matrix}

we can delivery as follows

\begin{matrix} \frac{| | d_{k} | |^{2}}{(g_{k}^{T} d_{k})^{2}} \leq \sum_{i = 1}^{k} \frac{1}{| | g_{i} | |^{2}} \leq \sum_{i = 1}^{k} \frac{1}{ɛ^{2}} = \frac{k}{ɛ^{2}}, {\frac{(g_{k}^{T} d_{k})}{| | d_{k} | |^{2}}}^{2} \geq \frac{ɛ^{2}}{k}, \\ \sum_{k = 1}^{\infty} \frac{(g_{k}^{T} d_{k})^{2}}{| | d_{k} | |^{2}} \geq \sum_{k = 1}^{\infty} \frac{ɛ^{2}}{k} = ɛ^{2} \sum_{k = 1}^{\infty} \frac{1}{k} = + \infty \end{matrix}

it is contrary to condition (7) of Lemma 2.1. ((7) holds), then Theorem 2.3. holds, WHS algorithm has global convergence.

Numerical experiments

In this section, we use some test functions of More et al.,¹¹ under the Wolfe line search, to balance the numerical performance of the two new spectral CG algorithms (DHS and WHS) and traditional HS algorithm. The program¹² is written in the MATLAB 2010b, and run on the computer with Intel(R) Core(TM) i5-5200U CPU @2.2.GHz and 4.00 GB SDRAM.

During the test, the parameters are set as follows

δ = 0.01, σ = 0.05, ɛ = 10^{- 6}, NI \leq 10000

The test results are shown in Tables 1 to 3, where Dim is the dimension of the function, NI is the number of iterations, NF is the number of times that the function is evaluated, NG is the number of gradient function calculations, t is the program run time, and f* is optimal function value.

Table 1.

Iterative comparison of two algorithms DHS and HS.

Function	Dim	DHS					HS
Function	Dim	NI	NF	NG	t	f*	NI	NF	NG	t	f*
Brown	1000	11	24	35	0.0372	5.0567e-014	18	39	57	0.0731	2.3824e-013
Rosenbrock	2	19	77	96	2.2364	2.3105e-010	26	169	195	4.4758	1.0759e-012
Beale	2	16	39	55	1.6579	1.6164e-012	24	49	73	2.6875	1.1759e-012
Jennrich1	2	5	18	23	0.9935	19.2868	**	**	**	**	**
Jennrich2	2	7	25	32	0.7462	0.2654	7	35	42	0.9462	0.2678
Jennrich3	2	7	25	32	0.9898	2.6540e-01	7	35	42	1.0754	2.6780e-01
Helical	3	33	102	135	5.6622	6.1062e-013	34	102	136	6.3648	5.7712e-012
Box	3	4	9	13	0.2469	0	4	49	53	1.3695	3.7020e-016
Powell	4	32	78	110	4.3471	9.0259e-07	32	94	126	4.3338	3.5339e-07
Wood	4	25	120	145	3.7868	3.5692e-009	43	149	192	6.1223	3.3871e-014
Dennis	4	36	129	165	22.0967	1.0505e-005	36	136	172	22.9442	1.0505e-005
Osborne1	5	30	98	128	13.5156	6.6115e-006	**	**	**	**	**
Biggs	6	26	75	101	11.2986	1.5107e-010	35	108	143	15.4920	4.1896e-013
Osborne 2	11	8	24	32	7.5832	0.0011	**	**	**	**	**

HS: Hestenes–Stiefel; NI: number of iterations; NF: number of times that the function is evaluated; NG: number of gradient function calculations.

Table 2.

Iterative comparison of two algorithms WHS and HS.

Function	μ	Dim	WHS					HS
Function	μ	Dim	NI	NF	NG	t	f*	NI	NF	NG	t	f*
Brown	0.7	1000	11	24	35	0.0425	1.0391e-015	18	39	57	0.0731	2.3824e-013
Rosenbrock	0.9	2	23	106	129	2.5038	4.5433e-06	26	169	195	4.4758	1.0759e-012
Beal	0.7	2	16	40	56	2.0654	9.4503e-019	24	49	73	2.6875	1.1759e-012
Helical	0	3	30	85	115	4.9473	3.2210e-07	34	102	136	6.3648	5.7712e-012
Jennrich2	0.7	2	8	35	43	2.2435	19.2714	**	**	**	**	**
Jennrich3	0.7	2	7	26	33	0.7014	0.2654	7	35	42	0.9462	0.2678
Box1	0.7	3	31	85	116	6.9584	8.8059e-012	43	124	167	8.3707	1.2714e-016
Box3	0.7	3	4	9	13	0.2390	4.9304e-032	4	49	53	1.3695	3.7020e-016
Powell	0	4	27	74	101	3.1286	7.5476e-07	32	94	126	4.3338	3.5339e-07
Wood	0.1	4	21	75	96	2.4886	1.06410e-05	43	149	192	6.1223	3.3871e-014
Kowalik	0.5	4	10	26	36	3.0372	4.4255e-011	24	61	85	7.8791	3.9243e-012
Dennis3	0.7	4	35	131	166	11.8212	3.7476e-020	47	173	219	17.7973	4.3353e-018
Osbornel	0.7	5	25	80	105	11.6990	6.6289e-006	**	**	**	**	**

HS: Hestenes–Stiefel; NI: number of iterations; NF: number of times that the function is evaluated; NG: number of gradient function calculations.

Table 3.

Iterative comparison of two algorithms DHS and WHS.

Function	μ	Dim	WHS					DHS
Function	μ	Dim	NI	NF	NG	t	f*	NI	NF	NG	t	f*
Rosenbrock	0.9	2	23	106	129	2.5038	4.5433e-06	19	77	96	2.302	2.3105e-10
Freudenstein	0.3	2	10	38	48	1.4631	48.9843	14	52	66	2.3852	4.8984e + 01
Jennrich3	0.8	2	7	26	33	0.7404	0.2654	7	25	32	0.9898	2.6540e-01
Beal	0.6	2	13	32	45	1.4518	4.93170e-10	16	39	55	1.6579	1.6164e-012
Helical	0	3	30	85	115	4.9473	3.2210e-07	29	92	121	6.67	2.5688e-07
Box3	0.7	3	4	9	13	0.2998	4.93040e-32	4	9	13	0.2469	0
Powell	0	4	27	74	101	3.1286	7.5476e-07	32	78	110	4.3471	9.0259e-07
Wood	0.1	4	21	75	96	2.4886	1.06410e-05	25	120	145	5.5815	3.5692e-09
Kowalik	0.5	4	10	26	36	3.0372	4.4255e-011	25	73	98	8.6739	4.1745e-10
Osbornel	0.7	5	25	80	105	11.6990	6.6289e-006	30	98	128	13.5156	6.6115e-006

HS: Hestenes–Stiefel; NI: number of iterations; NF: number of times that the function is evaluated; NG: number of gradient function calculations.

The sign “**” means that run stopped because the line search procedure failed to find a step length, this means that the algorithm has poor convergence.

The data in Table 1 show that most of the test functions’ NI, NF, and NG which are calculated by DHS algorithm are less than HS, so the new iterative method is effective. And t of DHS obviously lower than HS. The reduction of the number of iterations and the running time reflects the strong convergence of the algorithm, the decrease of the error indicates that the algorithm has better numerical results. DHS is more useful for solving unconstrained problems.

In Table 2, different function select different value of μ(at present, the choice of μ is uniform discrete, and $0 \leq μ \leq 1$ ). The data in Table 2 shows that most of the test functions’ NI, NF and NG which are calculated by WHS algorithm are less than by HS, and t of WHS obviously lower than HS. WHS is better than HS.

In Table 3, after selecting the appropriate μ. Calculating some of the functions, for example function Rosenbrock, Jennrich3, Helical and Box3, WHS algorithm perform slightly better than DHS. Looking the others, for example function Freudenstein, Beal, Powell, Wood, Kowalik and Osbornel, DHS algorithm perform slightly better than WHS. DHS and WHS methods are approximately equal.

To sum up, DHS and WHS both perform better than HS. The comparison between DHS and WHS needs a concrete analysis, but they are approximately equal.

In addition, we also put the performance profiles of WHS with uniform discrete μ in Table 4 of Appendix 1.

Note: In each function that in Tables 2 and 3, the value of μ selects the best one of the iterations.

Conclusion

In this paper, based on the classical HS method, we present two improved CG methods, that is, DHS and WHS methods.

In Section 3, we obtain the following theoretical results:

The DHS has sufficient descent property, and is globally convergent if the Wolfe line search (6) is used, and the parameter $0 < δ < σ < \frac{1}{4}$ .

The WHS has sufficient descent property, and is globally convergent if the Wolfe line search (6) is used, and the parameter $0 < δ < σ < \frac{1}{3}$ .

On the other hand, numerical results reported in Section 4 show that:

The average performance of the DHS and WHS methods proposed in this paper are generally better than that of the HS method.

The average performance of the DHS and WHS methods are approximately equal.

Footnotes

Acknowledgements

The authors are very grateful to the anonymous referees for their valuable comments and useful suggestions, which improved the quality of this paper.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: National Nature Science Foundation of China (No. 51405424, 51675461, 11673040) and Basic Research Project of Yanshan University (16LGY012).

Appendix 1

References

Cheng

Tang

. Optimization theory and method (Chinese), Beijing: National Defense Industry Press, 2008.

Hestenes

Stiefel

. Method of conjugate gradient for solving linear equations. J Res Natl Bureau Standards 1952; 49: 409–436.

Fletcher

Reeves

. Function minimization by conjugate gradients. Comput J 1964; 7: 149–154.

Polyak

Ribiere

. Note sur la convergence de methode de directions conjuguees. Revue Française Informatique Recherche Opérationnelle 1969; 16: 35–63.

Polyak

. The conjugate gradient method in extreme problems. USSR Comp Math Math Phys 1969; 9: 94–112.

Dai

Yuan

. A nonlinear conjugate gradient with a strong global convergence property. SIAM J Optim 1999; 10: 177–182.

Gibert

Nocedal

. Global convergence properties of conjugate gradient methods for optimization. SIMA J Optim 1992; 2: 21–42.

Shi

Shan

Liu

. The conjugate gradient algorithm with perturbation factor and its convergence. J Henan Univ Sci Technol 2013; 34: 83–87.

Wang

Gao

. Two mixed conjugate gradient methods based on DY. J Shandong Univ (Nat Sci) 2016; 51: 16–23.

10.

Zoutendijk

. Nonlinear programming computational methods. Integer Nonlin Program 1970; 143: 37–68.

11.

Garbow

Hilstrome

. Testing unconstrained optimization software. ACM Trans Math Softw 1981; 7: 17–41.

12.

Liu

. Application of optimization method and MATLAB implementation (Chinese), Beijing: Science Press, 2014.

Two new spectral conjugate gradient algorithms based on Hestenes–Stiefel

Abstract

Keywords

Introduction

The new algorithms

Global convergence

Theorem 2.1

Proof

Lemma 2.1

Theorem 2.2

Proof

Theorem 2.3

Proof

Numerical experiments

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

Appendix 1

References