Sage Journals: Discover world-class research

Abstract

We present a family of new inexact secant methods in association with Armijo line search technique for solving nonconvex constrained optimization. Different from the existing inexact secant methods, the algorithms proposed in this paper need not compute exact directions. By adopting the nonsmooth exact penalty function as the merit function, the global convergence of the proposed algorithms is established under some reasonable conditions. Some numerical results indicate that the proposed algorithms are both feasible and effective.

Keywords

Constrained optimization secant methods nonconvex optimization inexact method global convergence

Introduction

We assume that the nonlinear equality constrained problem to be solved is stated as

min_{x \in R^{n}} f (x) s . t . c (x) = 0

(1)

where

f : R^{n} \to R

and

c : R^{n} \to R^{m}

are smooth and possibly nonconvex functions.

The reduced Hessian methods in successive quadratic programming (SQP)^1,2 and secant methods (two-step algorithms) as defined in Fontecilla³ are two of the most successful methods for solving problem (1). Compared with the widely reduced Hessian methods in which the orthonormal basis for the tangent space of the constraints at the current point x_k changes continuously with k,^4,5 the secant methods have a main advantage which rests in the use of an orthogonal projection operator which is continuous.

Two basic approaches, namely the line search and trust region, have been developed in order to ensure global convergence towards local minima. In Byrd et al.,⁶ an algorithm, which is based on a characterization of inexact sequential quadratic programming (SQP) steps that can ensure global convergence, has been presented for large-scale equality constrained optimization. An exact penalty function was used to determine if a given inexact step makes sufficient progress toward a solution of the nonlinear program. The inexact Newton method was initially proposed by Dembo et al.⁷ to solve large systems of nonlinear equations. Dembo and Steihaug⁸ used that procedure to solve unconstrained minimization problems. Currently, Byrd et al.⁹ proposed an inexact Newton method for nonconvex equality constrained optimization. The method in Byrd et al.⁹ was allowed for the presence of negative curvature without requiring information about the inertia of the primal-dual iteration matrix for nonconvex problems. Then, an inexact Newton method with a filter line search algorithm is proposed in Wang et al.¹⁰ for the same problem. In Gu and Zhu¹¹ and Wang et al.,¹² inexact secant algorithms are proposed for solving the large-scale nonlinear systems of equalities and inequalities and nonlinear constrained convex optimization, respectively.

In this paper, we propose a class of new inexact secant methods with Armijo line search for nonconvex problems. The methods are globalized by line search technique and adopting l₁ penalty function as a merit function. The paper is organized as follows. In the next section, these algorithms are developed. Then, we analyze the global convergence properties. The results of numerical experience with these methods are discussed in Numerical results section. Final remarks are provided in the Conclusions section.

The proposed algorithms

Throughout the paper, we denote the Euclidean vector or matrix norm by $| | \cdot | |$ , l₁ norm by $| | \cdot | |_{1}$ , the gradient $\nabla f (x)$ by g(x). l is the Lagrangian function defined for $x \in R^{n}$ and $λ \in R^{m}$ by

l (x, λ) = f (x) + λ^{T} c (x)

The gradient of the Lagrangian function is denoted $\nabla_{x} l (x, λ)$ . We will be using the BFGS secant update defined by

W_{k + 1} = W_{k} - \frac{W_{k} d_{k} d_{k}^{T} W_{k}}{d_{k}^{T} W_{k} d_{k}} + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} d_{k}}

where

y_{k} = \nabla_{x} l (x_{k} + d_{k}, λ_{k + 1}) - \nabla_{x} l (x_{k}, λ_{k + 1})

and

d_{k} = x_{k + 1} - x_{k}

. We will write

W_{k + 1} = B F G S (d_{k}, y_{k}, W_{k}) .

The following is the basic idea about secant methods: At the $k -$ th each iteration

λ_{k + 1} = U (x_{k}, λ_{k}, W_{k})

(2a)

W_{k} {\hat{u}}_{k} = - \nabla_{x} l (x_{k}, λ_{k + 1})

(2b)

u_{k} = P_{k} {\hat{u}}_{k}

(2c)

v_{k} = - A_{k}^{†} c_{k}

(2d)

y_{k} = \nabla_{x} l (x_{k} + u_{k}, λ_{k + 1}) - \nabla_{x} l (x_{k}, λ_{k + 1})

(2e)

W_{k + 1} = B F G S (u_{k}, y_{k}, W_{k})

(2f)

x_{k + 1} = x_{k} + u_{k} + v_{k}

(2g)

where

A_{k}^{†}

is the pseudo-inverse of the gradient

A_{k} : = \nabla c_{k}

. The projection onto the null space of

A {(x)}^{T}

can be either the orthogonal projection P(x) given by

P (x) = I - A (x) {[A {(x)}^{T} A (x)]}^{- 1} A {(x)}^{T}

(3a)

or the oblique projection defined by

P (x) = I - W^{- 1} A (x) {[A {(x)}^{T} W^{- 1} A (x)]}^{- 1} A {(x)}^{T}

(3b)

The multiplier updates $U (x_{k}, λ_{k}, W_{k})$ in equation (2a) can be chosen from one of the following updates

Projection update:

λ_{k + 1} = - (A_{k}^{T} A_{k})^{- 1} A_{k}^{T} g_{k}

(4a)

Null-space update:

λ_{k + 1} = - {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} A_{k}^{T} W_{k}^{- 1} g_{k}

(4b)

Newton update:

λ_{k + 1} = {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} (c_{k} - A_{k}^{T} W_{k}^{- 1} g_{k})

(4c)

The pseudo-inverse of

A_{k}^{T}, A_{k}^{†}

will be either

A_{k}^{†} = A_{k} (A_{k}^{T} A_{k})^{- 1}

(5a)

A_{k}^{†} = W_{k}^{- 1} A_{k} {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1}

(5b)

While choosing different projection operators in equation (3), different multiplier updates in equation (4), and different pseudo-inverse in equation (5), we can obtain six algorithms which will be listed in Table 1.

Table 1.

Algorithms.

Algorithm	$λ_{k + 1}$	P(x)	$A_{k}^{†}$	Algorithm	$λ_{k + 1}$	P(x)	$A_{k}^{†}$
1	(4b)	I	(5b)	2	(4a)	(3b)	(5b)
3	(4c)	(3a)	(5a)	4	(4b)	I	(5a)
5	(4a)	(3b)	(5a)	6	(4c)	(3a)	(5b)

Next, we outline the algorithm and globalization strategy. An integral part of the approach is the mechanism used to determine if a trial step d is acceptable during a given iteration. We commonly use a merit function to determine whether a step is acceptable. The nondifferentiable l₁ merit function and the Fletcher’s exact and differentiable function are representative of most of merit function used in practice. The evaluation of the l₁ merit function, when compared to the Fletcher’s augmented Lagrangian merit function, is very inexpensive. One of the former’s potential drawbacks is that it can suffer from the Maratos effect. Several strategies have been used to remove the damaging effects of the Maratos effect. Therefore, we adopt the l₁ penalty function as the merit function

ϕ (x; ω) : = f (x) + ω | | c (x) | |_{1}

where

ω > 0

is known as the penalty parameter. If ω is greater than a certain threshold, then a first-order optimal point of equation (1) is a stationary point of

ϕ (x; ω)

. Although

ϕ (x; ω)

is not continuously differentiable, the directional derivative of

ϕ (x; ω)

in a direction d, denoted by

D ϕ (d; ω)

, is nonnegative at

x^{*}

for all

d \in R^{n}

In this paper, we perform a backtracking line search to compute a step length coefficient α_k satisfying the Armijo condition

ϕ (x_{k} + α_{k} d_{k}; ω_{k}) \leq ϕ (x_{k}; ω_{k}) + β α_{k} D ϕ (d_{k}; ω_{k})

where the constant

β \in (0, 1)

. Considering a local model of the merit function

ϕ (x; ω)

around the current iterate x_k, the approximation has the form

m_{k} (d; ω) = f_{k} + g_{k}^{T} d + ω | | c_{k} + A_{k}^{T} d | |_{1}

Hence, we can estimate the reduction in the merit function by evaluating

\begin{matrix} Δ m_{k} (d_{k}; ω_{k}) = m_{k} (0; ω_{k}) - m_{k} (d_{k}; ω_{k}) \\ = - g_{k}^{T} d_{k} + ω_{k} | | c_{k} | |_{1} \end{matrix}

(6)

Once an acceptable step is obtained, we must ensure that a positive α_k can be calculated to satisfy the Armijo condition. We will consider this issue in Lemma 1.

For nonconvex equality constrained optimization, that is

d_{k}^{T} W_{k} d_{k} < 0, A_{k}^{T} d_{k} = 0 for some d_{k}

(7)

using the method proposed in Byrd et al.,⁹ we have the reduction condition

Δ m_{k} (d_{k}; ω_{k}) \geq \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ | | u_{k} | |^{2}} + τ ω_{k} | | c_{k} | |

(8)

where

τ \in (0, 1), θ > 0

. The tangential component u_k may not be available explicitly, and so we are not able to compute the norm of this tangential component directly. This measure can be approximated by

| | d_{k} | |^{2}

, but in general we desire

ϒ_{k}

satisfying

| | u_{k} | |^{2} \leq ϒ_{k} \leq | | d_{k} | |^{2}

that is as close to

| | u_{k} | |^{2}

as possible so that equation (8) is not overly restrictive.

In this paper, we will compute the search direction inexactly, which means that equation (2b) will be substituted by

W_{k} {\hat{u}}_{k} = - \nabla_{x} l (x_{k}, λ_{k + 1}) + r_{k}

(9)

where

| | r_{k} | |

is the residual vector.

The model reduction condition in this work is

Δ m_{k} (d_{k}; ω_{k}) \geq \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} + τ ω_{k} | | c_{k} | |

If the model reduction condition is satisfied for the most recent value of the penalty parameter, we have the option of increasing the penalty parameter in order to satisfy the model reduction condition. This, however, should only be done in two circumstances. If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , it is safe to assume that the problem is sufficiently convex and so d_k should be accepted. If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \leq θ ϒ_{k}$ , we only consider increasing ω if v_k is a significant component of d_k. We can express this condition as $ϒ_{k} \leq ψ ν_{k}$ for some $ψ > 0$ , where $0 \leq ν_{k} \leq ‖ v_{k} | |^{2}$ is any lower bound for the squared norm of the normal step component. If none of the above conditions can be satisfied for the model reduction condition, we modify W_k. (It is assumed that after a finite number of modifications we have $W_{k} ≻ 2 θ I$ .) One technique is to modify W_k to increase some or all of its eigenvalues so that the resulting matrix is closer to being positive definite.

We now formally state the global inexact secant methods for solving equation (1).

Algorithms

s1. Initialize. Choose a starting point x₀. Set constants $β \in (0, 1), τ \in (0, 1), ς \in (0, 1)$ and $θ, ϵ, ε > 0$ . Give λ₀ and $ω_{- 1} > 0$ , and let $k : = 0$ .

s2. If $‖ (\begin{array}{l} \nabla_{x} l (x_{k}, λ_{k}) \\ c_{k} \end{array}) ‖ \leq ε$ , then stop.

s3. Evaluate f_k, g_k, c_k, A_k and W_k from equation (2f). Then compute the multiplier $λ_{k + 1}$ from (4).

s4. Find an inexact step d_k from

W_{k} {\hat{u}}_{k} = - \nabla_{x} l (x_{k}, λ_{k + 1}) + r_{k}

(10a)

u_{k} = P_{k} {\hat{u}}_{k}

(10b)

v_{k} = - A_{k}^{†} c_{k}

(10c)

d_{k} = u_{k} + v_{k}

(10d)

satisfying Conditions I or II, where P_k is given by equation (3) and

A_{k}^{†}

is given by equation (5).

Condition I

\begin{array}{l} Δ m_{k} (d_{k}; ω_{k}) \geq \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} + τ ω_{k} | | c_{k} | | \\ | | r_{k} | | \leq η_{k} | | \nabla_{x} l (x_{k}, λ_{k + 1}) | | \end{array}

(11)

where

ω_{k} = ω_{k - 1}, τ \in (0, 1), η_{k} \in [0, t_{1}]

and

t_{1} \in (0, 1)

Condition II

| | r_{k} | | \leq t_{2} | | c_{k} | |, and \frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k} or ϒ_{k} \leq ψ ν_{k}

(12)

where

t_{2}, ψ > 0

. If Conditons I or II cannot be satisfied, then modify W_k by increasing some or all of its eigenvalues.

s5. If Condition II is satisfied and

ω_{k} (1 - ϱ) | | c_{k} | |_{1} < g_{k}^{T} d_{k} + \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}}

for $ω_{k} = ω_{k - 1}$ , where $τ = ϱ (1 - ϵ)$ , then set

ω_{k} : = \frac{g_{k}^{T} d_{k} + \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}}}{(1 - ϱ) | | c_{k} | |_{1}} + 10^{- 4}

(13)

s6. Choose

α_{k} = 1, ς, ς^{2}, \dots,

until the following inequality is satisfied

ϕ (x_{k} + α_{k} d_{k}; ω_{k}) \leq ϕ (x_{k}; ω_{k}) + β α_{k} D ϕ (d_{k}; ω_{k})

s7. Set $x_{k + 1} = x_{k} + α_{k} d_{k}$ and $k : = k + 1$ . Go to s2.

Lemma 1. The directional derivative of the merit function $ϕ (x_{k}; ω_{k})$ along a step d_k satisfies

D ϕ (d_{k}; ω_{k}) = g_{k}^{T} d_{k} - ω_{k} | | c_{k} | |_{1}

Proof. From equation (3) and equation (10), we know that $c (x_{k}) + A {(x_{k})}^{T} d_{k} = 0$ . Using Taylor’s theorem, we have that

\begin{array}{l} ϕ (x_{k} + α d_{k}; ω_{k}) - ϕ (x_{k}; ω_{k}) \\ = f (x_{k} + α d_{k}) - f (x_{k}) + ω_{k} [{‖ c (x_{k} + α d_{k}) ‖}_{1} - {‖ c (x_{k}) ‖}_{1}] \\ = α g_{k}^{T} d_{k} + O (α^{2} {‖ d_{k} ‖}^{2}) + ω_{k} [{‖ c (x_{k}) + α A (x_{k}) d_{k} ‖}_{1} - {‖ c (x_{k}) | ‖}_{1}] \\ = α g_{k}^{T} d_{k} + O (α^{2} {‖ d_{k} ‖}^{2}) + ω_{k} [{‖ (1 - α) c (x_{k}) ‖}_{1} - {‖ c (x_{k}) ‖}_{1}] \\ = α [g_{k}^{T} d_{k} - ω_{k} {‖ c (x_{k}) ‖}_{1}] + O (α^{2} {‖ d_{k} ‖}^{2}) \end{array}

Dividing both sides by α and taking the limit as $α \to 0$ yields the result.

It is easy to prove that, if d_k satisfies the following model reduction condition

Δ m_{k} (d_{k}; ω_{k}) \geq \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} + τ ω_{k} | | c_{k} | |

where

τ \in (0, 1)

and

ω_{k} > 0

, then the directional derivative of the merit function satisfies

D ϕ (d_{k}; ω_{k}) \leq 0

, which ensures the model reduces.

Global convergence

In this section, we will establish the global convergence of the proposed algorithms under the following assumptions.

Assumptions G. The sequences $x_{k}, λ_{k}$ generated by Algorithms are contained in a convex set Ω and the following properties hold:

G1. f and c are bounded and twice continuously differentiable on Ω.

G2. The sequences ${W_{k}}$ and ${W_{k}^{- 1}}$ are bounded.

G3. The sequence ${λ_{k}}$ is bounded.

G4. $A (x_{k})$ have full row rank and their smallest singular values are bounded below by some positive constant.

Lemma 2. The sequence v_k given by equation (2d) is bounded and satisfies that

| | v_{k} | | \leq κ_{1} | | c_{k} | |

(14)

for some

κ_{1} > 0

Proof. From equations (2d), (5a) and (5b), we have that

| | v_{k} | | = | | A_{k} (A_{k}^{T} A_{k})^{- 1} c_{k} | | \leq | | A_{k} (A_{k}^{T} A_{k})^{- 1} | | | | c_{k} | |

(15)

| | v_{k} | | \leq | | W_{k}^{- 1} A_{k} {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} | | | | c_{k} | |

(16)

Assumptions G1, G2 and G4 imply that $| | A_{k} (A_{k}^{T} A_{k})^{- 1} | |$ and $| | W_{k}^{- 1} A_{k} {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} | |$ are bounded, then from equations (15) and (16), the result follows. □

Lemma 3. Suppose Assumptions G hold, the sequence of parameters ω_k is bounded above and for some $\hat{k}$ , $ω_{k} = ω_{\hat{k}}$ when $k \geq \hat{k}$ .

Proof. In our algorithm, ω_k is increased only if Condition II is satisfied. It means that

| | r_{k} | | \leq t_{2} | | c_{k} | |

From equation (6), we can obtain that

\begin{matrix} \begin{array}{l} Δ m_{k} (d_{k}; ω_{k}) - \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} \\ = ω_{k} | | c_{k} | | + {\begin{cases} - g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k}, & if \frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k} \\ - g_{k}^{T} d_{k} - θ ϒ_{k}, & otherwise \end{cases} \end{array} \end{matrix}

(17)

For Algorithms 1 and 4, we have that

\begin{matrix} u_{k} = {\hat{u}}_{k} = - W_{k}^{- 1} g_{k} + W_{k}^{- 1} A_{k} λ_{k + 1} + W_{k}^{- 1} r_{k} \\ = - W_{k}^{- 1} g_{k} + W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k})^{- 1} A_{k}^{T} W_{k}^{- 1} g_{k} + W_{k}^{- 1} r_{k} \end{matrix}

(18)

Hence

\begin{matrix} u_{k}^{T} W_{k} u_{k} = u_{k}^{T} [- g_{k} + A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k})^{- 1} A_{k}^{T} W_{k}^{- 1} g_{k} + r_{k}] \\ = - u_{k}^{T} g_{k} + g_{k}^{T} W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k})^{- 1} A_{k}^{T} W_{k}^{- 1} r_{k} + u_{k}^{T} r_{k} \end{matrix}

(19)

If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} < θ ϒ_{k}$ , then equation (12) and $0 \leq ν_{k} \leq | | v_{k} | |^{2}$ yield $| | u_{k} | |^{2} \leq ϒ_{k} \leq ψ ν_{k} \leq ψ | | v_{k} | |^{2}$ . This implies $| | d_{k} | | \leq \sqrt{1 + ψ} | | v_{k} | |$ is bounded. From Assumption G1, Lemma 2, there exist $γ_{1}, {\bar{γ}}_{1} > 0$ such that

\begin{matrix} - g_{k}^{T} d_{k} - θ ϒ_{k} \geq - | | g_{k} | | | | d_{k} | | - θ | | d_{k} | |^{2} \geq \\ - γ_{1} (\sqrt{1 + ψ}) | | v_{k} | | \geq - {\bar{γ}}_{1} | | c_{k} | | \end{matrix}

(20)

If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , from Assumption G2, there exists some constant $γ_{2} > 0$ such that

\frac{| | W_{k} | |}{\sqrt{γ_{2}}} + \frac{| | W_{k} | |}{2 γ_{2}} \leq \frac{θ}{2}

(21)

If $| | u_{k} | |^{2} < γ_{2} | | v_{k} | |^{2}$ , then as above, we can obtain $| | d_{k} | |$ is bounded, and there exist $γ_{3}, {\bar{γ}}_{3} > 0$ such that

\begin{matrix} - g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq - γ_{3} | | d_{k} | | \geq \\ - γ_{3} (\sqrt{1 + γ_{2}}) | | v_{k} | | \geq - {\bar{γ}}_{3} | | c_{k} | | \end{matrix}

(22)

Otherwise, $| | u_{k} | |^{2} \geq γ_{2} | | v_{k} | |^{2}$ . By equation (21) and $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , we have

\begin{matrix} \frac{1}{2} u_{k}^{T} W_{k} u_{k} \geq θ ϒ_{k} - u_{k}^{T} W_{k} v_{k} - \frac{1}{2} v_{k}^{T} W_{k} v_{k} \\ \geq (θ - \frac{| | W_{k} | |}{\sqrt{γ_{2}}} - \frac{| | W_{k} | |}{2 γ_{2}}) | | u_{k} | |^{2} \geq \frac{θ}{2} | | u_{k} | |^{2} \end{matrix}

(23)

By equations (18) and (23), we have

\begin{array}{l} θ | | u_{k} | |^{2} \leq u_{k}^{T} W_{k} u_{k} \\ = u_{k}^{T} (- g_{k} + A_{k} {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} A_{k}^{T} W_{k}^{- 1} g_{k} + r_{k}) \\ \leq (| | g_{k} | | + | | r_{k} | | + | | A_{k} {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} A_{k}^{T} W_{k}^{- 1} g_{k} | |) | | u_{k} | | \end{array}

and so

| | u_{k} | |

is bounded under Assumptions G. From Lemma 2, equations (11) and (19), we can obtain that

\begin{matrix} - g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k} \\ = - \frac{1}{2} v_{k}^{T} W_{k} v_{k} - g_{k}^{T} v_{k} + \frac{1}{2} u_{k}^{T} W_{k} u_{k} \\ - u_{k}^{T} W_{k} v_{k} - g_{k}^{T} u_{k} - u_{k}^{T} W_{k} u_{k} \\ = - \frac{1}{2} v_{k}^{T} W_{k} v_{k} - g_{k}^{T} v_{k} + \frac{1}{2} u_{k}^{T} W_{k} u_{k} - u_{k}^{T} W_{k} v_{k} \\ - [g_{k}^{T} W_{k}^{- 1} A_{k} {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} A_{k}^{T} W_{k}^{- 1} + u_{k}^{T}] r_{k} \\ \geq - \frac{1}{2} | | v_{k} | |^{2} | | W_{k} | | - | | g_{k} | | | | v_{k} | | \\ - | | u_{k} | | | | W_{k} | | | | v_{k} | | - | | g_{k}^{T} W_{k}^{- 1} A_{k} \\ × {(A_{k}^{T} W_{k}^{- 1} A_{k})}^{- 1} A_{k}^{T} W_{k}^{- 1} + u_{k}^{T} | | | | r_{k} | | \geq - γ_{4} | | c_{k} | | \end{matrix}

(24)

where γ₄ is a positive constant. By equations (17), (20), (22) and (24), we have

\begin{array}{l} Δ m_{k} (d_{k}; ω_{k}) - \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} \\ \geq (ω_{k} - \max {{\bar{γ}}_{1}, {\bar{γ}}_{3}, γ_{4}}) | | c_{k} | | \end{array}

Thus, if $ω_{\hat{k}}$ satisfies $ω_{\hat{k}} \geq \max {{\bar{γ}}_{1}, {\bar{γ}}_{3}, γ_{4}} / (1 - ϱ)$ , then $ω_{k} = ω_{\hat{k}}$ for all $k \geq \hat{k}$ . The Lemma is true.

For Algorithms 2 and 5, we have that

{\hat{u}}_{k} = - W_{k}^{- 1} g_{k} + W_{k}^{- 1} A_{k} (A_{k}^{T} A_{k})^{- 1} A_{k}^{T} g_{k} + W_{k}^{- 1} r_{k}

Then

\begin{matrix} u_{k} = P_{k} {\hat{u}}_{k} = - W_{k}^{- 1} g_{k} + W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k}) -^{1} A_{k}^{T} W_{k}^{- 1} g_{k} \\ + P_{k} W_{k}^{- 1} r_{k} \end{matrix}

with P_k given by equation (3b). Hence

u_{k}^{T} W_{k} u_{k} = - g_{k}^{T} u_{k} + u_{k}^{T} W_{k} P_{k} W_{k}^{- 1} r_{k}

(25)

If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} < θ ϒ_{k}$ , similarly, there exist $γ_{5} > 0$ such that

- g_{k}^{T} d_{k} - θ ϒ_{k} \geq - γ_{5} | | c_{k} | |

(26)

If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , similarly, if $| | u_{k} | |^{2} < γ_{2} | | v_{k} | |^{2}$ , then there exist $γ_{6} > 0$ such that

- g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq - γ_{6} | | c_{k} | |

(27)

Otherwise, $| | u_{k} | |^{2} \geq γ_{2} | | v_{k} | |^{2}$ . By equations (21) and $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , we have

\frac{1}{2} u_{k}^{T} W_{k} u_{k} \geq \frac{θ}{2} | | u_{k} | |^{2}

(28)

By equations (25) and (28), we have

θ | | u_{k} | |^{2} \leq u_{k}^{T} W_{k} u_{k} \leq (| | g_{k} | | + | | W_{k} P_{k} W_{k}^{- 1} | | | | r_{k} | |) | | u_{k} | |

and so

| | u_{k} | |

is bounded under Assumptions G. From Lemma 2, equations (11) and (25), we can obtain that

\begin{matrix} - g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k} = - \frac{1}{2} v_{k}^{T} W_{k} v_{k} - g_{k}^{T} v_{k} + \frac{1}{2} u_{k}^{T} W_{k} u_{k} \\ - u_{k}^{T} W_{k} v_{k} - g_{k}^{T} u_{k} - u_{k}^{T} W_{k} u_{k} \\ = - \frac{1}{2} v_{k}^{T} W_{k} v_{k} - g_{k}^{T} v_{k} + \frac{1}{2} u_{k}^{T} W_{k} u_{k} - u_{k}^{T} W_{k} v_{k} \\ - u_{k}^{T} W_{k} P_{k} W_{k}^{- 1} r_{k} \geq - \frac{1}{2} | | v_{k} | |^{2} | | W_{k} | | \\ - | | g_{k} | | | | v_{k} | | - | | u_{k} | | | | W_{k} | | | | v_{k} | | \\ - | | u_{k}^{T} W_{k} P_{k} W_{k}^{- 1} | | | | r_{k} | | \geq - γ_{7} | | c_{k} | | \end{matrix}

(29)

where γ₇ is a positive constant. By equations (17), (26), (27) and (29), we have

\begin{matrix} Δ m_{k} (d_{k}; ω_{k}) - \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} \\ \geq (ω_{k} - \max {γ_{5}, γ_{6}, γ_{7}}) | | c_{k} | | \end{matrix}

Thus, if $ω_{\hat{k}}$ satisfies $ω_{\hat{k}} \geq \max {γ_{5}, γ_{6}, γ_{7}} / (1 - ϱ)$ , then $ω_{k} = ω_{\hat{k}}$ for all $k \geq \hat{k}$ . The Lemma is true.

For Algorithms 3 and 6, we have that

\begin{matrix} {\hat{u}}_{k} = - W_{k}^{- 1} g_{k} + W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k})^{- 1} A_{k}^{T} W_{k}^{- 1} g_{k} \\ - W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k})^{- 1} c_{k} + W_{k}^{- 1} r_{k} \end{matrix}

Then

\begin{matrix} u_{k} = P_{k} {\hat{u}}_{k} = (I - A_{k} [A_{k}^{T} A_{k} -^{1} A_{k}^{T}) {\hat{u}}_{k} \\ = - W_{k}^{- 1} g_{k} + W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k}) -^{1} A_{k}^{T} W_{k}^{- 1} g_{k} \\ - W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k}) -^{1} c_{k} + A_{k} (A_{k}^{T} A_{k}) -^{1} c_{k} + P_{k} W_{k}^{- 1} r_{k} \\ = {\hat{u}}_{k} - W_{k}^{- 1} r_{k} - v_{k} + P_{k} W_{k}^{- 1} r_{k} \end{matrix}

with P_k given by equation (3a). Hence

\begin{matrix} u_{k}^{T} W_{k} u_{k} = u_{k}^{T} W_{k} {\hat{u}}_{k} - u_{k}^{T} r_{k} - u_{k}^{T} W_{k} v_{k} + u_{k}^{T} W_{k} P_{k} W_{k}^{- 1} r_{k} \\ = - u_{k}^{T} g_{k} - u_{k}^{T} W_{k} v_{k} + u_{k}^{T} W_{k} P_{k} W_{k}^{- 1} r_{k} \end{matrix}

(30)

If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} < θ ϒ_{k}$ , similarly, there exist $γ_{8} > 0$ such that

- g_{k}^{T} d_{k} - θ ϒ_{k} \geq - γ_{8} | | c_{k} | |

(31)

If $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , similarly, if $| | u_{k} | |^{2} < γ_{2} | | v_{k} | |^{2}$ , then there exist $γ_{9} > 0$ such that

- g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq - γ_{9} | | c_{k} | |

(32)

Otherwise, $| | u_{k} | |^{2} \geq γ_{2} | | v_{k} | |^{2}$ . By equations (21) and $\frac{1}{2} d_{k}^{T} W_{k} d_{k} \geq θ ϒ_{k}$ , we have

\frac{1}{2} u_{k}^{T} W_{k} u_{k} \geq \frac{θ}{2} | | u_{k} | |^{2}

(33)

By equations (30) and (33), we have

\begin{matrix} θ | | u_{k} | |^{2} \leq u_{k}^{T} W_{k} u_{k} \leq (| | g_{k} | | + | | W_{k} v_{k} | | \\ + | | W_{k} P_{k} W_{k}^{- 1} | | | | r_{k} | |) | | u_{k} | | \end{matrix}

and so

| | u_{k} | |

is bounded under Assumptions G. From Lemma 2, equations (11) and (30), we can obtain that

\begin{matrix} - g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} W_{k} d_{k} = - \frac{1}{2} v_{k}^{T} W_{k} v_{k} - g_{k}^{T} v_{k} + \frac{1}{2} u_{k}^{T} W_{k} u_{k} \\ - u_{k}^{T} W_{k} v_{k} - g_{k}^{T} u_{k} - u_{k}^{T} W_{k} u_{k} \\ = - \frac{1}{2} v_{k}^{T} W_{k} v_{k} - g_{k}^{T} v_{k} \\ + \frac{1}{2} u_{k}^{T} W_{k} u_{k} - u_{k}^{T} W_{k} P_{k} W_{k}^{- 1} r_{k} \\ \geq - \frac{1}{2} | | v_{k} | |^{2} | | W_{k} | | - | | g_{k} | | | | v_{k} | | \\ - | | u_{k} | | | | W_{k} | | | | W_{k}^{- 1} | | | | r_{k} | | \geq - γ_{10} | | c_{k} | | \end{matrix}

(34)

where γ₁₀ is a positive constant.

By equations (17), (31), (32) and (34), we have

\begin{matrix} Δ m_{k} (d_{k}; ω_{k}) - \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} \\ \geq (ω_{k} - \max {γ_{8}, γ_{9}, γ_{10}}) | | c_{k} | | \end{matrix}

Thus, if $ω_{\hat{k}}$ satisfies $ω_{\hat{k}} \geq \max {γ_{8}, γ_{9}, γ_{10}} / (1 - ϱ)$ , then $ω_{k} = ω_{\hat{k}}$ for all $k \geq \hat{k}$ . The Lemma is true. □

Lemma 4. Suppose Assumptions G hold. There exist constants $κ_{5} > 0$ and $κ_{6} > 0$ such that

| | d_{k} | |^{2} + | | c_{k} | | \leq 2 κ_{5} (ϒ_{k} + | | c_{k} | |)

(35)

and

D ϕ (d_{k}; ω_{k}) \leq - κ_{6} (ϒ_{k} + | | c_{k} | |)

Proof. Since $| | d_{k} | |^{2} = | | u_{k} | |^{2} + | | v_{k} | |^{2}$ , from Lemma 2 and the fact that $ϒ_{k}$ is an upper bound for $| | u_{k} | |^{2}$ , we have

| | d_{k} | |^{2} \leq κ_{5} (ϒ_{k} + | | c_{k} | |)

(36)

Then the inequality equation (35) follows from equation (36).

From equations (6), (35) and (9), we have

\begin{matrix} D ϕ (d_{k}; ω_{k}) \leq - \max {\frac{1}{2} d_{k}^{T} W_{k} d_{k}, θ ϒ_{k}} - τ ω_{k} | | c_{k} | | \\ \leq - θ ϒ_{k} - \frac{τ ω_{k}}{2} | | c_{k} | | \end{matrix}

Since $ω_{k} \geq ω_{- 1}$ , the second inequality holds for $κ_{6} = \min {θ, \frac{σ ω_{- 1}}{2}} > 0$ . □

From the above lemma, we can easily prove that the sequence ${α_{k}}$ is bounded below and away from zero (see Lemma 5 in Byrd et al.⁹). We now state the main result as follows.

Theorem 1. Let ${x_{k}}$ be a sequence generated by the proposed algorithm. Suppose Assumptions G hold, then

\lim_{k \to \infty} | | c_{k} | | = 0 and \lim_{k \to \infty} | | \nabla_{x} l (x_{k}, λ_{k + 1}^{N}) | | = 0,

where

λ_{k + 1}^{N}

is given by equation (4c).

Proof. By equation (13) and Lemma 4, there exists a $\hat{k} > 0$ such that for all $k > \hat{k}$

\begin{matrix} ϕ (x_{\hat{k}}; ω_{\hat{k}}) - ϕ (x_{k}; ω_{\hat{k}}) = \sum_{j = \hat{k}}^{k - 1} (ϕ (x_{j}; ω_{\hat{k}}) - ϕ (x_{j + 1}; ω_{\hat{k}})) \end{matrix}

\begin{matrix} \geq - \sum_{j = \hat{k}}^{k - 1} β α_{j} D ϕ (d_{j}; ω_{j}) \geq \sum_{j = \hat{k}}^{k - 1} β α_{j} κ_{6} (ϒ_{k} + | | c_{k} | |) \\ \geq \frac{β κ_{6}}{2 κ_{5}} \sum_{j = \hat{k}}^{k - 1} α_{j} (| | d_{k} | |^{2} + | | c_{k} | |) \end{matrix}

This, along with the fact that ${α_{k}}$ is bounded below and Assumption G1, implies

\lim_{k \to \infty} | | d_{k} | | = 0, ana \lim_{k \to \infty} | | c_{k} | | = 0

(37)

Using a way similar to Lemma 4.1 and Lemma 4.2 in Gu and Zhu,¹¹ we can get

\begin{matrix} W_{k} d_{k} = - \nabla_{k} l (x_{k}, λ_{k + 1}^{N}) + χ_{k} W_{k} [W_{k}^{- 1} A_{k} (A_{k}^{T} W_{k}^{- 1} A_{k}) -^{1} \\ - A_{k} (A_{k}^{T} A_{k}) -^{1}] c_{k} + W_{k} P_{k} W_{k}^{- 1} r_{k} \end{matrix}

where χ_k can have the values 1, –1 or 0. This, along with equation (37) and Assumption G2, implies that the theorem is true. □

Numerical results

In this section, we report some numerical experiments of the proposed algorithms, which were implemented in MATLAB 7.0. The numerical results have been obtained by running our algorithms on the set of 44 equality constrained problems, which can be found on the web page:http://www.gamsworld.org/performance/princetonlib/htm/group5stat.htm.

In each case, the starting point supplied with the problem was used. All attempts to solve the test problems were limited to a maximum of 1000 iterations or half an hour of CPU time. We set the parameters in the algorithm as follows: $β = 0.5, τ = 0.1, ς = 0.5, ω_{- 1} = 1, θ = 10^{- 8} \max {| | W_{k} | |_{1}, 1}, ϵ = 10^{- 3}$ and $ε = 10^{- 5},$ which seemed to work reasonably well for a broad class of problems. The Jacobian matrices and the Hessian matrices were approximated by finite differences.

In Table 2, we present the names of the 44 problems with their number of variables (n), number of equality constraints (m) and best-known objective $(f_{*})$ . Tables 2 to 5 list the numerical results of the problems by using Algorithms 1–6. In the tables, iter, n_f, n_c and n_g stand for the number of iterations, the number of evaluating function f, the number of calculations of c and the number of gradient evaluation of f, respectively. The CPU times (seconds) were denoted by cput. A “-” means that the algorithm failed. A “(*)” means that the problem was strictly convex over the set of computed iterates. In every algorithm, the number of gradient evaluations of f, n_g, is equal to that of gradient evaluations of c. Then, the number of gradient evaluation of c is not listed.

Table 2.

Basic data of the test functions.

Name	n	m	$f_{*}$	Name	n	m	$f_{*}$
bt2	3	1	0.0325	bt3	5	3	4.0930
bt4	3	2	−45.5105	bt5	3	2	961.7151
bt6	5	2	0.2770	bt7	5	3	306.1999
bt9	4	2	−1.0000	bt10	2	2	−1.0000
bt11	5	3	0.8248	bt12	5	3	6.1888
catena	32	11	−23077.7462	dtoc1nd	735	490	12.5277
eigena2	110	55	0.0000	eigenaco	110	55	0.0000
eigenb2	110	55	18.0000	eigenbco	110	55	9.0000
eigenc2	462	231	0.0000	eigencco	30	15	0.0000
fccu	19	8	11.1491	genhs28	10	8	0.9271
gilbert	1000	1	482.0272	hs006	2	1	0.0000
hs007	2	1	−1.7320	hs008	2	2	−1.0000
hs026	3	1	0.0000	hs027	3	1	0.0400
hs028	3	1	0.0000	hs039	4	2	−1.0000
hs040	4	3	−0.2500	hs046	5	2	0.0210
hs047	5	3	0.0000	hs048	5	2	0.0000
hs049	5	2	0.0000	hs050	5	3	0.0000
hs051	5	3	0.0000	hs052	5	3	5.3266
hs077	5	2	0.2415	hs078	5	3	−2.9197
hs079	5	3	0.0787	hs100lnp	7	2	680.6300
hs111lnp	10	3	−47.7614	maratos	2	1	−1.0000
mwright	5	3	24.9788	orthregb	27	6	0.0000

Table 3.

Numerical comparisons.

Name	Algorithm 1					Algorithm 2
	iter	n_f	n_g	n_c	cput	iter	n_f	n_g	n_c	cput
bt2(*)	19	22	39	41	0.464	33	44	67	77	0.059
bt3(*)	14	30	29	44	0.088	24	48	49	72	0.150
bt4	17	39	35	56	0.056	27	65	55	92	0.072
bt5	7	8	15	15	0.700	9	12	19	21	0.057
bt6(*)	30	61	61	91	0.135	28	45	57	73	0.232
bt7	65	383	131	448	0.345	–	–	–	–	–
bt9	14	26	29	40	0.056	14	25	29	39	0.051
bt10(*)	6	6	13	12	0.159	2	2	5	4	0.006
bt11(*)	12	18	25	30	0.208	12	16	25	28	0.087
bt12(*)	12	20	25	32	0.205	28	77	57	105	0.194
catena	23	35	44	48	2.608	31	39	47	55	2.887
dtoc1nd	24	29	35	35	3.007	24	30	33	37	3.153
eigena2	26	95	55	122	1660.4	26	98	57	126	1780.4
eigenaco	2	4	5	6	121.399	2	4	5	6	114.150
eigenb2	3	5	7	8	126.209	3	5	7	8	133.324
eigenbco	2	3	5	5	119.773	2	3	5	5	112.864
eigenc2	3	6	7	7	134.112	3	6	7	7	129.316
eigencco	18	29	37	47	2.335	–	–	–	–	–
fccu(*)	22	52	45	74	1.527	21	48	43	69	1.285
genhs28(*)	6	7	13	13	0.222	8	9	17	17	0.398
gilbert	2	5	6	8	120.231	2	5	6	8	118.883
hs006	19	71	39	90	0.061	–	–	–	–	–
hs007	15	51	31	66	0.105	10	20	21	30	0.029
hs008(*)	2	3	5	5	0.130	2	3	5	5	0.005
hs026	203	269	407	472	0.444	127	625	255	752	0.275
hs027(*)	206	291	412	311	69.903	–	–	–	–	–
hs028(*)	8	10	17	18	0.0292	9	11	19	20	0.0312
hs039	14	26	29	40	0.084	14	25	29	39	0.083
hs040(*)	5	5	11	10	0.046	5	5	11	10	0.042
hs046(*)	62	497	124	559	3.117	–	–	–	–	–
hs047	39	281	78	320	2.323	12	56	24	69	0.732
hs048(*)	8	17	17	25	0.047	9	18	19	27	0.047
hs049(*)	31	42	63	73	0.249	31	42	63	73	0.157
hs050(*)	15	26	31	41	0.101	15	26	31	41	0.097
hs051(*)	9	19	19	28	0.063	10	20	21	30	0.066
hs052(*)	14	37	29	51	0.098	11	29	23	40	0.075
hs077(*)	24	32	49	56	0.115	21	24	43	45	0.100
hs078(*)	6	7	13	13	0.540	6	7	13	13	0.107
hs079(*)	12	17	25	29	0.083	16	23	33	39	0.095
hs100lnp	51	22	103	133	0.407	57	89	115	146	0.440
hs111lnp	96	256	193	352	2.083	223	844	447	1067	4.870
maratos(*)	4	4	9	8	6.705	4	4	9	8	3.100
mwright(*)	21	37	43	58	139.119	24	77	49	101	158.353
orthregb	12	13	25	25	1.046	11	11	23	22	0.834

Table 4.

Numerical comparisons (continued).

Name	Algorithm 3					Algorithm 4
	iter	n_f	n_g	n_c	cput	iter	n_f	n_g	n_c	cput
bt2(*)	19	22	39	41	0.464	33	44	67	77	0.059
bt3(*)	14	30	29	44	0.088	24	48	49	72	0.150
bt4	17	39	35	56	0.056	27	65	55	92	0.072
bt5	10	13	21	23	0.040	8	8	17	16	0.034
bt6(*)	26	42	53	68	0.113	29	58	59	87	0.128
bt7	58	270	117	328	0.305	32	118	65	150	0.167
bt9	14	21	29	35	0.053	15	29	31	44	0.055
bt10(*)	4	4	9	8	0.010	7	7	15	14	0.016
bt11(*)	12	18	25	30	0.087	12	18	25	30	0.088
bt12(*)	40	135	81	175	0.272	15	26	31	41	0.106
catena	28	37	54	56	3.038	33	40	57	58	2.529
dtoc1nd	23	33	42	46	3.416	25	42	47	47	3.603
eigena2	27	95	55	122	1574.0	26	98	57	126	1683.6
eigenaco	2	4	5	6	114.128	2	4	5	6	117.059
eigenb2	3	5	7	8	154.210	3	5	7	8	143.225
eigenbco	2	3	5	5	112.406	2	3	5	5	118.161
eigenc2	3	6	7	7	122.603	3	6	7	7	135.140
eigencco	16	26	33	42	2.096	18	31	37	49	2.373
fccu(*)	22	49	45	71	1.341	19	44	39	63	1.166
genhs28(*)	8	9	17	17	0.276	8	9	17	17	0.275
gilbert	2	5	6	8	134.221	2	5	6	8	127.802
hs006	–	–	–	–	–	19	70	39	89	0.048
hs007	10	23	21	33	0.030	9	23	19	32	0.024
hs008(*)	3	4	7	7	0.007	2	3	5	5	0.05
hs026	44	137	89	181	0.095	157	924	315	1081	0.339
hs027(*)	–	–	–	–	–	–	–	–	–	–
hs028(*)	9	11	19	20	0.030	8	10	17	18	0.027
hs039	14	21	29	35	0.081	15	29	31	44	0.090
hs040(*)	5	5	11	10	0.044	5	5	11	10	0.041
hs046(*)	61	515	123	577	2.961	180	188	360	206	8.738
hs047	89	89	179	478	0.524	–	–	–	–	–
hs048(*)	9	18	19	27	0.046	8	17	17	25	0.043
hs049(*)	31	42	63	73	0.154	31	42	63	73	0.148
hs050(*)	15	26	31	41	0.097	15	26	31	41	0.094
hs051(*)	10	20	21	30	0.066	11	23	23	34	0.072
hs052(*)	11	29	23	40	0.074	10	25	21	35	0.069
hs077(*)	23	32	47	55	0.109	23	28	47	51	0.106
hs078(*)	6	7	13	13	0.040	6	7	13	13	0.041
hs079(*)	15	21	31	36	0.098	13	16	27	29	0.081
hs100lnp	41	78	83	119	0.320	76	248	553	256	21.461
hs111lnp	86	201	173	287	1.855	106	306	213	412	2.279
maratos(*)	3	3	7	6	2.414	4	4	9	8	3.097
mwright(*)	19	51	39	70	142.905	12	28	25	40	93.680
orthregb	13	15	27	28	0.982	12	13	25	25	0.897

Table 5.

Numerical comparisons (continued).

Name	Algorithm 5					Algorithm 6
	iter	n_f	n_g	n_c	cput	iter	n_f	n_g	n_c	cput
bt2(*)	19	22	39	41	0.464	33	44	67	77	0.059
bt3(*)	14	30	29	44	0.088	24	48	49	72	0.150
bt4	17	39	35	56	0.056	27	65	55	92	0.072
bt5	8	9	17	17	0.035	10	14	21	24	0.042
bt6(*)	37	76	75	113	0.159	29	54	59	83	0.128
bt7	–	–	–	–	–	27	22	55	109	0.141
bt9	15	27	31	42	0.058	12	15	25	27	0.044
bt10(*)	2	2	5	4	0.005	4	4	9	8	0.010
bt11(*)	14	20	29	34	0.101	11	15	23	26	0.081
bt12(*)	33	96	67	129	0.227	18	34	37	52	0.125
catena	31	37	46	49	2.317	33	41	45	53	3.004
dtoc1nd	25	28	54	65	2.815	26	38	54	67	3.744
eigena2	27	95	55	122	1576.2	27	95	55	122	1486.5
eigenaco	2	4	5	6	118.047	2	4	5	6	116.598
eigenb2	3	5	7	8	118.710	3	5	7	8	121.337
eigenbco	2	3	5	5	113.767	2	3	5	5	115.366
eigenc2	3	6	7	7	125.706	3	6	7	7	119.482
eigencco	–	–	–	–	–	16	24	33	40	2.034
fccu(*)	20	47	41	67	1.228	21	47	43	68	1.294
genhs28(*)	8	9	17	17	0.278	8	9	17	17	0.277
gilbert	2	5	6	8	131.025	2	5	6	8	129.304
hs006	–	–	–	–	–	–	–	–	–	–
hs007	10	22	21	32	0.030	12	28	25	40	0.032
hs008(*)	2	3	5	5	0.005	3	4	7	7	0.010
hs026	127	630	255	757	0.272	37	101	75	138	0.081
hs027(*)	–	–	–	–	–	–	–	–	–	–
hs028(*)	9	11	19	20	0.033	9	11	19	20	0.033
hs039	15	27	31	42	0.088	12	15	25	27	0.077
hs040(*)	5	5	11	10	0.045	5	5	11	10	0.045
hs046(*)	–	–	–	–	–	189	203	379	222	9.682
hs047	–	–	–	–	–	44	117	89	161	0.257
hs048(*)	9	18	19	27	0.043	9	18	19	27	0.045
hs049(*)	31	42	63	73	0.149	31	42	63	73	0.156
hs050(*)	15	26	31	41	0.094	15	26	31	41	0.099
hs051(*)	10	20	21	30	0.064	10	20	21	30	0.064
hs052(*)	11	29	23	40	0.073	12	31	25	43	0.082
hs077(*)	22	25	45	47	0.103	23	31	47	54	0.108
hs078(*)	6	7	13	13	0.041	6	7	13	13	0.040
hs079(*)	12	18	25	30	0.077	12	19	25	31	0.077
hs100lnp	372	390	914	647	44.907	270	254	541	281	21.221
hs111lnp	123	353	247	476	2.664	106	280	213	386	2.286
maratos(*)	4	4	9	8	3.134	3	3	7	6	2.453
mwright(*)	18	32	37	50	173.467	11	24	23	35	83.719
orthregb	11	11	23	22	0.820	13	15	27	28	0.985

For all problems solved successfully, all algorithms achieved the same optimal function value as best-known objectives taken from the above web page, which are listed in Table 2 (within the termination tolerance). When comparing the six algorithms, we see that Algorithm 1, which successfully solves all problems, is superior to the other algorithms. The numerical results can indicate that the proposed algorithms are feasible and effective.

In the following, we compare Algorithm 1 and Algorithm INS in terms of the number of iterations. Algorithm INS (inexact Newton with smart tests) is proposed in Byrd et al.⁹ and also used to solve nonconvex equality constrained optimization. The aim of these algorithms is to solve nonconvex optimization. Therefore, in order to compare the effectiveness of the two algorithms, we only choose some of large-scale nonconvex equality constrained optimization test functions. The data are listed in Table 6. iter-A1 and iter-INS stand for the number of iterations of Algorithm 1 and that of Algorithm INS, respectively. By comparison, we argue that Algorithm 1 is a little superior to Algorithm INS.

Table 6.

Iteration statistics for Algorithm 1 and Algorithm INS.

Name	n	m	iter-A1	iter-INS	Name	n	m	iter-A1	iter-INS
catena	32	11	23	48	dtoc1nd	735	490	24	49
eigena2	110	55	26	148	eigenaco	110	55	2	28
eigenb2	110	55	3	21	eigenbco	110	55	2	218
eigenc2	462	231	3	130	eigencco	30	15	18	172
gilbert	1000	1	2	22	hs026	3	1	203	203
hs039	4	2	14	17	orthregb	27	6	12	21

Conclusions

We have presented and analyzed a family of inexact secant methods for nonconvex equality constrained optimization. By associating these algorithms with Armijo line search technique and adopting the nonsmooth exact penalty function as the merit function, we have proved that the proposed methods are globally convergent under some standard assumptions. The numerical results on a large set of test problems show that the proposed methods exhibit good practical performance in terms of efficiency and robustness. We believe that these inexact secant methods will continue to find application in more diverse areas.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported partially by the National Natural Science Foundation of China (Grant 11571074) and the Natural Science Foundation of Hunan Province (Grant 2016JJ2038).

References

Coleman

Conn

AR.

On the local convergence of a quasi-Newton method for the nonlinear programming problem. SIAM J Numer Anal 1984; 21: 755–769.

Nocedal

Overton

ML.

Projected Hessian updating algorithms for nonlinearly constrained optimization. SIAM J Numer Anal 1985; 22: 821–850.

Fontecilla

Local convergence of secant methods for nonlinear constrained optimization. SIAM J Numer Anal 1988; 25: 692–712.

Byrd

Schnabel

RB.

Continuity of the null space basis and constrained optimization. Math Program 1986; 35: 32–41.

Byrd

RH.

On the convergence of constrained optimization methods with accurate Hessian information on a subspace. SIAM J Numer Anal 1990; 27: 141–153.

Byrd

Curtis

Nocedal

An inexact SQP method for equality constrained optimization. SIAM J Optim 2008; 19: 351–369.

Dembo

Eisenstat

Steihaug

Inexact Newton methods. SIAM J Numer Anal 1982; 19: 400–408.

Dembo

Steihaug

Truncated-Newton algorithms for large-scale unconstrained optimization. Math Program 1983; 26: 190–212.

Byrd

Curtis

Nocedal

An inexact Newton method for nonconvex equality constrained optimization. Math Program 2010; 122: 273–299.

10.

Wang

Zhu

Nie

CY.

A filter line search algorithm based on an inexact Newton method for nonconvex equality constrained optimization. Acta Math Appl Sin Engl Ser 2017; 33: 687–698.

11.

Zhu

DT.

An inexact secant algorithm for large scale nonlinear systems of equalities and inequalities. Appl Math Model 2012; 36: 3612–3620.

12.

Wang

Cai

Zhu

DT.

Line search filter inexact secant methods for nonlinear equality constrained optimization. Appl Math Comp 2015; 263: 47–58.

A family of global convergent inexact secant methods for nonconvex constrained optimization

Abstract

Keywords

Introduction

The proposed algorithms

Global convergence

Numerical results

Conclusions

Footnotes

Declaration of conflicting interests

Funding

References