Sage Journals: Discover world-class research

Abstract

This paper presents and analyzes an affine-scaling interior-point algorithm with a filter line-search method for solving nonlinear optimization problems with nonlinear equality constraints and nonnegative variables. In our scheme, we require that a damped Newton’s method is applied to the perturbed first-order necessary conditions to produce a search direction. Some filtered rules for a fixed barrier parameter are used to determine step acceptance. Second-order correction technique is used to reduce infeasibility and overcome the Maratos effect. The global convergence and fast local convergence rate of the proposed algorithm are established under some suitable conditions.

Keywords

Affine-scaling interior-point filter method line search barrier method

Introduction

The optimization problem considered in this paper is

min_{x \in R^{n}} f (x) s . t . c (x) = 0 x \geq 0,

(1.1)

where

f (x) : R^{n} \to R^{1}

and

c (x) : R^{n} \to R^{m}

with

m \leq n

are twice continuously differentiable.

Filter methods have been applied in many current optimization techniques (see e.g. Arzani and Peyghami,¹ Eichfelder et al.,² and Li and Zhu³). Interior-point methods compute a step by solving a linear system. Researchers have extended filter methods to interior-point methods. Ulbrich et al.⁴ and Wätcher and Biegler^5,6 developed convergence theory for interior-point filter methods. Ulbrich et al. decomposed the primal-dual step obtained from the perturbed first-order necessary conditions into a normal and a tangential step, whose sizes were controlled by a trust-region type parameter. However, replacing the objective by the optimality measure means that the algorithm may be more likely to converge to stationary points that are not local minimizers. Wätcher and Biegler proposed a filter method framework which was based on line search and could be applied to barrier interior-point methods as well as active set SQP methods. Their search direction is obtained by calculating the quadratic subproblem, which is known to involve a large amount of calculation. Wätcher and Biegler’s interior-point method avoids the pitfall of many interior-point methods that may converge to spurious stationary points (see Wätcher and Biegler^5,6 and Vicente⁷).

In this paper, we propose a new affine-scaling interior-point filter line search method for solving the optimization (1.1). Different from the method provided by Wätcher and Biegler,⁶ we combine with an affine-scaling strategy and obtain a search direction by solving linear equations. Using this technique, the computation of the algorithm is relatively small. In the case of local convergence, we show that the proposed method has quadratic convergence rate. The paper is organized as follows. Section “The basic algorithm” presents the overall algorithm, including the affine-scaling barrier method and the filter line-search procedure. In section “Convergence analysis,” global and local convergence properties of the proposed algorithm are established.

Throughout the paper, the element of a sequence is denoted by a subscript (e.g. $f (x_{k}) = f_{k}$ ). In particular, $\nabla f (x_{k})$ is denoted by $g_{k}$ and $\nabla c (x_{k})$ by $A_{k},$ where $\nabla c (x_{k})$ is the Jacobian of $c (x) .$ The $i$ th component of a vector $v \in R^{n}$ is written as $v^{(i)}$ .

The basic algorithm

In this section, we modify and improve an existing interior-point line search algorithm, which is formally summarized in section “The basic algorithm.”

The affine-scaling interior-point approach

Let the Lagrangian function of problem (1.1) be

L (x, λ, z) : = l (x, λ) - z^{T} x : = f (x) + λ^{T} c (x) - z^{T} x

(2.1)

where

λ \in R^{m}

and

z \in R^{n}

are the Lagrangian multipliers. The Karush-Kuhn-Tucher (K-K-T) conditions of this problem can be written as

\begin{aligned} (\begin{matrix} \nabla_{x} l (x, λ) - z \\ c (x) \\ X Z r \end{matrix}) = (\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}) \end{aligned}

(2.2)

and

x \geq 0, z \geq 0,

where

X : = diag (x)

for a vector

x,

Z : = diag (z),

r

stands for the vector of all ones for appropriate dimension. By introducing a diagonal scaling matrix

D (x, λ) : = diag (d (x, λ)^{(1)}, \dots,

d (x, λ)^{(n)})

with

d (x, λ)^{(i)} : = {\begin{matrix} (x^{(i)})^{\frac{1}{2}} & if \nabla_{x} l (x, λ)^{(i)} \geq 0 \\ 1 & if \nabla_{x} l (x, λ)^{(i)} < 0 \end{matrix}

(2.3)

we can obtain a modified K-K-T conditions in the form

D (x, λ)^{2} \nabla_{x} l (x, λ) - μ r = 0, c (x) = 0

(2.4)

and

x > 0,

where

μ

is a non-negative number. Note that the equations for

μ = 0

together with “

x, z \geq 0

” are the K-K-T conditions of the original problem (1.1). Furthermore, the modified Newton step

(\hat{s}, {\hat{s}}^{λ})

for (2.4) satisfies

\begin{aligned} (\begin{matrix} D (x, λ) \nabla_{x x}^{2} l (x, λ) D (x, λ) + E (x, λ) & D (x, λ) A {(x)}^{T} \\ A (x) D (x, λ) & 0 \end{matrix}) (\begin{matrix} \hat{s} \\ s^{λ} \end{matrix}) \\ = - (\begin{matrix} D (x, λ) \nabla_{x} l (x, λ) - μ D (x, λ)^{- 1} r \\ c (x) \end{matrix}) \end{aligned}

(2.5)

where

\nabla_{x x}^{2} l (x, λ) = \nabla^{2} f (x) + \sum_{i = 1}^{m} λ^{(i)} \nabla^{2} c {(x)}^{(i)},

and

E (x, λ) : = diag (e (x, λ)^{(1)}, \dots, e (x, λ)^{(n)})

with

\begin{aligned} e (x, λ)^{(i)} : = {\begin{matrix} \nabla_{x} l (x, λ)^{(i)} & if \nabla_{x} l (x, λ)^{(i)} > 0 \\ 0 & if \nabla_{x} l (x, λ)^{(i)} \leq 0 \end{matrix} \end{aligned}

On the other hand, as a barrier method, the proposed algorithm computes (approximate) solutions for a sequence of barrier problems

min_{x \in R^{n}} φ_{μ} (x) : = f (x) - μ \sum_{i = 1}^{n} \ln (x^{(i)}) s . t . c (x) = 0

(2.6)

with the barrier parameter sequence

μ

, which is driven to zero. Let

L

be the Lagrangian function for the problems (2.6) defined by

L (x, λ) : = φ_{μ} (x) + λ^{T} c (x) : = f (x) - μ \sum_{i = 1}^{n} \ln (x^{(i)}) + λ^{T} c (x) .

Since

\nabla φ_{μ} (x) = g (x) - μ X^{- 1} r,

we can write (2.5) as

\begin{aligned} (\begin{matrix} D (x, λ) \nabla_{x x}^{2} l (x, λ) D (x, λ) + E (x, λ) & D (x, λ) A {(x)}^{T} \\ A (x) D (x, λ) & 0 \end{matrix}) (\begin{matrix} \hat{s} \\ s^{λ} \end{matrix}) \\ = - (\begin{matrix} D (x, λ) (\nabla φ_{μ} (x) + A {(x)}^{T} λ) \\ c (x) \end{matrix}) \end{aligned}

(2.7)

Optimality conditions for problem (1.1) are well established by (2.4) for

μ = 0.

A feasible point

x_{*}

is said to be the stationary point for problem (1.1), if there exists a vector

λ_{*} \in R^{m}

such that

D (x_{*}, λ_{*})^{2} \nabla_{x} l (x_{*}, λ_{*}) = 0, c (x_{*}) = 0, x_{*} \geq 0

(2.8)

Strict complementarity is said to hold at

x_{*}

if one and only one of the two inequalities

x_{*}^{(i)} > 0

and

e_{*}^{(i)} : = e (x_{*}, λ_{*})^{(i)} > 0

(

z_{*}^{(i)} > 0

) holds, that is,

| x_{*}^{(i)} | + | e_{*}^{(i)} | > 0 (i = 1, \dots, n) .

If the pair

(x_{*}, λ_{*})

satisfies strict complementarity, the second order sufficient conditions are equivalent to (2.8) and the positive definiteness of

D (x_{*}, λ_{*}) \nabla_{x x}^{2} l (x_{*}, λ_{*}) D (x_{*}, λ_{*}) + E (x_{*}, λ_{*})

on the null space of

A (x_{*}) D (x_{*}, λ_{*}) .

Under the standard Newton’s method assumptions for problem (1.1) , we have that the matrix

(\begin{matrix} D (x_{*}, λ_{*}) \nabla_{x x}^{2} l (x_{*}, λ_{*}) D (x_{*}, λ_{*}) + E (x_{*}, λ_{*}) & D (x_{*}, λ_{*}) A (x_{*})^{T} \\ A (x_{*}) D (x_{*}, λ_{*}) & 0 \end{matrix})

is nonsingular. The proof is given by [Vicente,⁸ prop. 3.3].

Therefore, the form of the coefficient matrix in (2.7) and the standard Newton’s method assumptions imply that the affine-scaling interior-point Newton algorithm is well defined in a neighborhood of a nondegenerate regular point that satisfies the second-order sufficient optimality conditions.

The filter line-search method and second-order correction steps

We view (2.6) as a bi-objective optimization problem that minimizes the objective function $φ_{μ_{j}} (x)$ and the constraint violation $θ (x) : = ‖ c (x) ‖_{1},$ where $μ_{j}$ is a given fixed value of the barrier parameter $μ$ . We apply the filter technique developed and analyzed in section 2. and Section 3. by Wätcher and Biegler⁶ to solve the barrier problem (2.6) and replace all occurrences of “ $f$ ” by “ $φ_{μ_{j}}$ .” Using the transformation $\hat{s} = D (x, λ)^{- 1} s$ , (2.7) implies that

\begin{aligned} \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} & = - s_{k}^{T} {D_{k}}^{- 1} (D_{k} \nabla_{x x}^{2} l_{k} D_{k} + E_{k}) {D_{k}}^{- 1} s_{k} \\ - (A_{k}^{T} (s_{k}^{λ} + λ_{k}))^{T} s_{k} \\ = - {\hat{s}}_{k}^{T} (D_{k} \nabla_{x x}^{2} l_{k} D_{k} + E_{k}) {\hat{s}}_{k} - λ_{k + 1}^{T} A_{k} s_{k} \\ = - {\hat{s}}_{k}^{T} (D_{k} \nabla_{x x}^{2} l_{k} D_{k} + E_{k}) {\hat{s}}_{k} + λ_{k + 1}^{T} c_{k} \end{aligned}

(2.9)

the switching condition (9) by Wätcher and Biegler⁶ holds if the projection of the top-left matrix in (2.7) onto the null space of

A_{k} D_{k}

is uniformly positive definite and a feasible non-optimal point is approached. However, the Armijo condition (11) by Wätcher and Biegler⁶ prevents the method from converging to such a point.

Since $x$ is positive at the optimal solution of the barrier problem (2.6), this property is maintained for all iterates. Therefore, in order for $x_{k + 1}$ to be strictly feasible, $α_{k, l}$ is chosen to satisfy $x_{k} + α_{k, l} s_{k} \geq (1 - τ_{j}) x_{k} > 0,$ where the parameter $τ_{j} \in (0, 1)$ is defined by

τ_{j} = max {τ_{\min}, 1 - μ_{j}}

(2.10)

with

τ_{\min} \in (0, 1)

. Here, given an initial value for the barrier parameter

μ_{0} > 0,

the parameter

μ_{j}

is obtained from

μ_{j + 1} = max {\frac{ϵ_{tol}}{10}, min {κ_{μ} μ_{j}, μ_{j}^{θ_{μ}}}}

(2.11)

with constants

κ_{μ} \in (0, 1)

and

θ_{μ} \in (1, 2) .

Obviously the barrier parameter is eventually decreased at a superlinear rate and

μ_{j + 1} \geq ϵ_{tol} / 10

ensures that numerical difficulties are avoided at the end of the optimization procedure. More precisely, we choose

α_{k, l} = min {- τ_{j} x_{k}^{(i)} / s_{k}^{(i)} : s_{k}^{(i)} < 0, i = 1, \dots, n}

(2.12)

Hence, there exists

t \in {1, \dots, n}

such that

α_{k, l} = - τ_{j} x_{k}^{(t)} / s_{k}^{(t)}

(2.13)

According to the Newton’s method assumptions for problem (1.1), there exists

(x_{*}, λ_{*})

satisfying the K-K-T conditions (2.8). Since the strict complementarity of problem (1.1) holds at

x_{*},

we have

| x_{*}^{(i)} | + | e_{*}^{(i)} | > 0, i = 1, \dots, n .

Pick

ϵ = \frac{1}{2} (| x_{*}^{(i)} | + | e_{*}^{(i)} |) .

Define the set of active indices $I (x) = {i \in {1, \dots, n} : x^{(i)} = 0} .$

If $i \notin I (x_{*}),$ then $ϵ = \frac{1}{2} (| x_{*}^{(i)} |) .$ Since $x_{k} \to x_{*},$ and $x_{k}$ is an interior-point, we have $x_{k}^{(i)} = | x_{k}^{(i)} | \geq \frac{1}{2} | x_{*}^{(i)} | = ϵ,$ and then $- \frac{s_{k}^{(i)}}{x_{k}^{(i)}} \leq \frac{‖ s_{k} ‖}{x_{k}^{(i)}} \leq \frac{‖ s_{k} ‖}{ϵ} .$ Using (2.13), we can obtain $α_{k, l} \geq \frac{τ_{j} ϵ}{‖ s_{k} ‖} .$

If $i \in I (x_{*}),$ then $ϵ = \frac{1}{2} (| e_{*}^{(i)} |) .$ Since $x_{k} \to x_{*},$ we have $| e_{k}^{(i)} | \geq \frac{1}{2} | e_{*}^{(i)} | = ϵ .$ The first equation in (2.5) implies that

\begin{aligned} - \frac{s_{k}^{(i)}}{x_{k}^{(i)}} & = (\frac{(D_{k}^{2} \nabla_{x x}^{2} l_{k} s_{k})^{(i)}}{e_{k}^{(i)}} + \frac{(D_{k}^{2} A_{k}^{T} s_{k}^{λ})^{(i)}}{e_{k}^{(i)}} + \frac{(D_{k}^{2} \nabla_{x} l_{k})^{(i)}}{e_{k}^{(i)}} - \frac{μ_{j}}{e_{k}^{(i)}}) / x_{k}^{(i)} \\ = \frac{(\nabla_{x x}^{2} l_{k} s_{k})^{(i)}}{e_{k}^{(i)}} + \frac{(A_{k}^{T} s_{k}^{λ})^{(i)}}{e_{k}^{(i)}} + 1 - \frac{μ_{j}}{x_{k}^{(i)} e_{k}^{(i)}} \end{aligned}

(2.14)

Then (2.13) indicates

α_{k, l} = \frac{τ_{j} e_{k}^{(t)}}{(\nabla_{x x}^{2} l_{k} s_{k})^{(t)} + (A_{k}^{T} s_{k}^{λ})^{(t)} + e_{k}^{(t)} - \frac{μ_{j}}{x_{k}^{(t)}}} \geq \frac{τ_{j}}{1 + κ_{1} ‖ s_{k} ‖ + κ_{2} ‖ s_{k}^{λ} ‖}

where

κ_{1}

and

κ_{2}

are positive constants and independent of

k,

and we use

\frac{μ_{j}}{(x_{k})^{t} (e_{k})^{t}} > 0

in the first inequality.

Combining with formula (18) by Wätcher and Biegler,⁶ we can summarize this in the following formula for a minimal trial step size

α_{k}^{min} : = γ_{α} \cdot {\begin{matrix} min {γ_{θ}, - \frac{γ_{φ} θ (x_{k})}{\nabla φ_{μ_{j}} (x_{k})^{T} s_{k}}, \frac{δ [θ (x_{k})]^{s_{θ}}}{[- \nabla φ_{μ_{j}} (x_{k})^{T} s_{k}]^{s_{φ}}}, \frac{τ_{j} ϵ}{‖ s_{k} ‖}, \frac{τ_{j}}{1 + κ_{1} ‖ s_{k} ‖ + κ_{2} ‖ s_{k}^{λ} ‖}} \\ if \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} < 0 and θ (x_{k}) \leq θ^{min} \\ min {γ_{θ}, - \frac{γ_{φ} θ (x_{k})}{\nabla φ_{μ_{j}} (x_{k})^{T} s_{k}}, \frac{τ_{j} ϵ}{‖ s_{k} ‖}, \frac{τ_{j}}{1 + κ_{1} ‖ s_{k} ‖ + κ_{2} ‖ s_{k}^{λ} ‖}} \\ if \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} < 0 and θ (x_{k}) > θ^{min} \\ min {γ_{θ}, \frac{τ_{j} ϵ}{‖ s_{k} ‖}, \frac{τ_{j}}{1 + κ_{1} ‖ s_{k} ‖ + κ_{2} ‖ s_{k}^{λ} ‖}} otherwise \end{matrix}

(2.15)

with a “safety factor”

γ_{α} \in (0, 1] .

When

α_{k, l}

becomes smaller than

α_{k}^{min}

, the algorithm reverts to feasibility restoration phase.

Consider that the filter approach can suffer from the Maratos effect and improve the search direction by means of a second-order correction. If the first trial step size $α_{k, 0}$ has been rejected, we compute a second-order correction

s_{k}^{soc} = - D_{k} (A_{k} D_{k})^{+} c (x_{k} + s_{k})

(2.16)

where

A^{+}

denotes the pseudo-inverse of

A^{T} .

Therefore,

s_{k}^{soc}

can be written as

s_{k}^{soc} = - D_{k} (D_{k} A_{k}^{T}) [(A_{k} D_{k}) (D_{k} A_{k}^{T})]^{- 1} c (x_{k} + s_{k})

(2.17)

where

s_{k} = D_{k} {\hat{s}}_{k}

and

\hat{s}

is defined by (2.7). Note that

s_{k}^{soc}

has the property that satisfies a linearization of the constraints at the point

x_{k} + s_{k}

, that is,

A_{k} s_{k}^{soc} + c (x_{k} + s_{k}) = 0.

If $s_{k}^{soc}$ is a second-order correction step, and ${\bar{s}}_{k}^{soc}$ is an additional second-order correction step, then $s_{k}^{soc} + {\bar{s}}_{k}^{soc}$ can be understood as a single second-order correction step for $s_{k} .$ Similarly, several consecutive correction steps can be considered as a single one.

The basic algorithm

The algorithm contains an inner and an outer loop. Using the individual parts of (2.4), we define the optimality error for the barrier problem as

G_{μ} (x, λ) = ‖ D (x, λ)^{2} \nabla_{x} l (x, λ) - μ r ‖ + ‖ c (x) ‖_{1}

(2.18)

G_{0} (x, λ)

we denote (2.20) with

μ = 0;

this measures the optimality error for the original problem (1.1). The overall algorithm terminates if an approximate solution

({\tilde{x}}_{*}, {\tilde{λ}}_{*})

satisfying

G_{0} ({\tilde{x}}_{*}, {\tilde{λ}}_{*}) \leq ϵ_{tol}

is found, where

ϵ_{tol} > 0

is the user provided error tolerance. We denote the iteration counter by

j

for the “outer loop” and require that the approximate solution

({\tilde{x}}_{*, j + 1}, {\tilde{λ}}_{*, j + 1})

of the barrier problem (2.6), for a given value of

μ_{j},

satisfies the tolerance

G_{μ_{j}} ({\tilde{x}}_{*, j + 1}, {\tilde{λ}}_{*, j + 1}) \leq κ_{ϵ} μ_{j}

for a constant

κ_{ϵ} > 0,

before the algorithm continues with the solution of the next barrier problem. A precise description of the algorithm follows.

Algorithm I

Initialize. Give starting point $(x_{0}, λ_{0})$ with $x_{0} > 0$ ; $μ_{0} > 0;$ $ϵ_{tol} > 0;$ $κ_{ϵ} > 0; κ_{μ} \in (0, 1);$ $θ_{μ} \in (1, 2);$ $τ_{\min} \in (0, 1);$ $θ_{\max} \in (θ (x_{0}), \infty]; θ^{\min} > 0;$ $γ_{θ}, γ_{φ} \in (0, 1); δ > 0; γ_{α} \in (0, 1]; s_{θ} > 1; s_{φ} \geq 2 s_{θ}; η_{φ} \in (0, \frac{1}{2}) .$ Initialize the filter $F_{0} : = {(θ, φ) \in R^{2} : θ \geq θ_{\max}$ } and the iteration counter $j \leftarrow 0$ and $k \leftarrow 0.$ Obtain $τ_{0}$ from (2.10).

Check convergence for the overall problem. If $G_{0} = ‖ D_{k}^{2} \nabla_{x} l_{k} ‖ + ‖ c_{k} ‖_{1} \leq ϵ_{tol}$ , stop;

Check convergence for the barrier problem. If $G_{μ_{j}} = ‖ D_{k}^{2} \nabla_{x} l_{k} - μ_{j} r ‖ + ‖ c_{k} ‖_{1} \leq κ_{ϵ} μ_{j}$ , then: 3.1

Compute $τ_{j + 1}$ and $μ_{j + 1}$ from (2.10) and (2.11), and set $j \leftarrow j + 1$ .

3.2

Re-initialize the filter $F_{k} \leftarrow {(θ, φ) \in R^{2} : θ \geq θ_{\max}$ };

3.3

If $k = 0$ repeat step 3, if not, then go to the next step.

Compute search direction. Compute $({\hat{s}}_{k}, s_{k}^{λ})$ by solving the linear system

\begin{aligned} (\begin{matrix} D_{k} H_{k} D_{k} + E_{k} & D_{k} A_{k}^{T} \\ A_{k} D_{k} & 0 \end{matrix}) (\begin{matrix} \hat{s} \\ s^{λ} \end{matrix}) \\ = - (\begin{matrix} D_{k} (\nabla φ_{μ_{j}} (x_{k}) + A_{k}^{T} λ_{k}) \\ c_{k} \end{matrix}) \end{aligned}

(2.19)

where

H_{k}

is the Hessian matrix

\nabla_{x x}^{2} l_{k}

or its approximation. Then compute

(s_{k}, s_{k}^{λ})

by setting

s_{k} = D_{k} {\hat{s}}_{k} .

If the linear system is detected to be too ill-conditioned, go to feasibility restoration phase.

Backtracking line search. 5.1

Set $α_{k, 0} = 1$ and $l \leftarrow 0$ .

5.2

Compute new trial point $x_{k} (α_{k, l}) : = x_{k} + α_{k, l} s_{k}$ .

5.3

Check acceptability to the filter. If $(θ (x_{k} (α_{k, l})), φ_{μ_{j}} (x_{k} (α_{k, l}))) \in F_{k},$ reject the trial step size and go to 5.5.

5.4

Check sufficient decrease with respect to current iterate. 5.4.1

Case I: $θ (x_{k}) \leq θ^{min}$ and

\begin{aligned} \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} < 0 and α_{k, l} [- \nabla φ_{μ_{j}} (x_{k})^{T} s_{k}]^{s_{φ}} > δ θ_{k}^{s_{θ}} \end{aligned}

(2.20)

holds: If the Armijo condition

φ_{μ_{j}} (x_{k} (α_{k, l})) \leq φ_{μ_{j}} (x_{k}) + η_{φ} α_{k, l} \nabla φ_{μ_{j}} (x_{k})^{T} s_{k}

(2.21)

holds and

x_{k} (α_{k, l}) > 0

, accept the trial step and go to step 6. Otherwise, go to step 5.5.

5.4.2

Case II: $θ (x_{k}) > θ^{min}$ or (2.20) is not satisfied: If

\begin{aligned} θ (x_{k} (α_{k, l})) \leq (1 - γ_{θ}) θ_{k} or φ_{μ_{j}} (x_{k} (α_{k, l})) < φ_{μ_{j}} (x_{k}) - γ_{φ} θ_{k} \end{aligned}

(2.22)

holds and

x_{k} (α_{k, l}) > 0

, accept the trial step and go to step 6. Otherwise, go to step 5.5.

5.5

Compute second-order correction step. If $l \neq 0$ , go to step 5.8. Otherwise, compute the second-order correction step $s_{k}^{soc}$ by (2.17), and obtain the next new trial point ${\tilde{x}}_{k + 1} = x_{k} + s_{k} + s_{k}^{soc} .$

5.6

Check acceptability to the filter. If $(θ ({\tilde{x}}_{k + 1}), φ_{μ_{j}} ({\tilde{x}}_{k + 1})) \in F_{k}$ , reject the second-order correction step and go to the step 5.8.

5.7

Check sufficient decrease with respect to current iterate. 5.7.1

Case I: $θ (x_{k}) \leq θ^{min}$ and (2.20) holds (for $α_{k, 0}$ ): If

φ_{μ_{j}} ({\tilde{x}}_{k + 1}) \leq φ_{μ_{j}} (x_{k}) + η_{φ} \nabla φ_{μ_{j}} (x_{k})^{T} s_{k}

(2.23)

holds and

{\tilde{x}}_{k + 1} > 0

, accept

x_{k + 1} : = {\tilde{x}}_{k + 1}

and go to step 6. Otherwise, go to step 5.8.

5.7.2

Case II: $θ (x_{k}) > θ^{min}$ or (2.20) is not satisfied (for $α_{k, 0}$ ): If

\begin{aligned} θ ({\tilde{x}}_{k + 1}) \leq (1 - γ_{θ}) θ (x_{k}), or φ_{μ_{j}} ({\tilde{x}}_{k + 1}) \leq φ_{μ_{j}} (x_{k}) - γ_{φ} \end{aligned} θ (x_{k})

holds and

{\tilde{x}}_{k + 1} > 0

, accept

x_{k + 1} : = {\tilde{x}}_{k + 1}

and go to step 6. Otherwise, go to step 5.8.

5.8

Set $α_{k, l + 1} = γ_{α} α_{k, l},$ and $l \leftarrow l + 1.$ If $α_{k, l} < α_{k}^{\min}$ with $α_{k}^{\min}$ defined in (2.15), go to the feasibility phase. Otherwise, go back to step 5.2.

Accept trial point. Set $α_{k} : = α_{k, l},$ $x_{k + 1} : = x_{k} + α_{k} s_{k}$ and $λ_{k + 1} : = λ_{k} + α_{k} s_{λ}$ .

Augment the filter if necessary. If (2.20) or (2.21) does not hold for $α_{k}$ , augment the filter using $F_{k + 1} : = F_{k} ⋃ {(θ, φ) \in R^{2} : θ \geq (1 - γ_{θ}) θ_{k} and φ \geq φ_{μ_{j}} (x_{k}) - γ_{φ} θ_{k}} .$ Otherwise, set $F_{k + 1} = F_{k}$ .

Set: $k = k + 1$ , go to step 2.

Feasibility restoration phase. Augment the filter, and compute a new iterate $x_{k + 1} > 0$ by decreasing the infeasibility measure $θ (x)$ , so that $x_{k + 1}$ satisfies the sufficient decrease conditions (2.22) and is acceptable to the filter. Continue with the regular iteration in step 9.

Convergence analysis

We denote by $A \subseteq N$ the set of indices of those iterations in which the filter has been augmented, by $R \subseteq N$ that of all iterations indices in which the feasibility restoration phase is invoked, and by $R_{inc} \subseteq R$ that of those iteration counters in which the restoration phase is invoked from step 4. Obviously, the relationship $R_{inc} \subseteq R \subseteq A \subseteq N$ holds. For the global convergence analysis of this algorithm, we state the following assumptions.

Assumptions G. Let ${x_{k}}$ be the sequence generated by Algorithm I, where we assume that the feasibility restoration phase in step 10 always terminates successfully and the algorithm does not stop in step 2. (G1)

$f$ and $c$ are differentiable on an open set $C$ , where $C \subseteq R^{n}$ with $[x_{k}, x_{k} + s_{k}] \in C$ for all $k \notin R_{inc}$ . For $f$ and $c$ , their function values and their first derivatives are bounded and Lipschitz-continuous over $C$ .

(G2)

The matrices $H_{k}$ are uniformly bounded for all $k \notin R_{inc} .$

(G3)

The Hessian approximations $D_{k} H_{k} D_{k} + E_{k}$ are uniformly positive definite on the null subspace of $A_{k} D_{k}$ . In other words, there exists a constant $M_{H} > 0$ so that for all $k \notin R_{inc}$

λ_{min} (V_{k}^{T} [D_{k} H_{k} D_{k} + E_{k}] V_{k}) \geq M_{H},

(3.1)

where

λ_{min} (A)

is the minimum eigenvalue of matrix

A

, the columns of

V_{k} \in R^{n \times (n - m)}

form an orthonormal basis matrix of the null space of

A_{k} D_{k} .

(G4)

The matrices $D_{k} A_{k}^{T}$ have full column rank over $C$ . Norm sequences ${‖ (A_{k} D_{k})^{+} ‖}$ are bounded for all $k$ .

(G5)

There exists a constant $θ_{inc} > 0$ so that $k \notin R_{inc}$ whenever $θ (x_{k}) \leq θ_{inc} .$

(G6)

The iterates ${x_{k}}$ are bounded.

(G7)

There exist constants $δ_{θ}, δ_{x} > 0$ so that whenever the restoration phase is called in step 10 in an iteration $k \in R$ with $θ_{k} \leq δ_{θ},$ it returns a new iterate with $x_{k + 1}^{(i)} \geq x_{k}^{(i)}$ for all components satisfying $x_{k}^{(i)} \leq δ_{x} .$

Using the QR-factorization of

D_{k} {A_{k}}^{T}

, we can obtain matrices

U_{k} \in R^{n \times m}

and

V_{k} \in R^{n \times (n - m)}

. The columns of

[U_{k}, V_{k}]

form an orthonormal basis of

R^{n}

. Set

{\bar{q}}_{k} : = - [A_{k} D_{k} U_{k}]^{- 1} c_{k}

(3.2\rm a)

\begin{aligned} {\bar{p}}_{k} : = & - [V_{k}^{T} (D_{k} H_{k} D_{k} + E_{k}) V_{k}]^{- 1} V_{k}^{T} [(D_{k} H_{k} D_{k} + E_{k}) q_{k} \\ + D_{k} \nabla φ_{μ} (x_{k})] \\ q_{k} : = U_{k} {\bar{q}}_{k}, p_{k} : = V_{k} {\bar{p}}_{k} \end{aligned}

(3.2\rm b)

the overall search direction

s_{k} = q_{k} + p_{k}

can be taken, where

q_{k}

and

p_{k}

are two orthogonal components.

We define the criticality measure as

\begin{aligned} χ (x_{k}) : = {\begin{matrix} ‖ {\bar{p}}_{k} ‖_{2}, & if k \notin R_{inc} \\ \infty, & otherwise \end{matrix} \end{aligned}

(3.3)

Consider a subsequence of iterates

{x_{k_{i}}}

with

\lim_{i} χ (x_{k_{i}}) = 0

and

\lim_{i} x_{k_{i}} = x_{*}

for some feasible limit point

x_{*}

. Using the definition of

χ (x_{k})

we get

k_{i} \notin R_{inc}

for

i

sufficiently large. Assumption (G4) and (3.2a) imply that

\lim_{i} {\bar{q}}_{k_{i}} = 0 .

Therefore, from

\lim_{i} χ (x_{k_{i}}) = 0,

Assumption (G3) we obtain

\lim_{i} ‖ V_{k_{i}}^{T} D_{k_{i}} \nabla φ_{μ} (x_{k_{i}}) ‖ = 0 .

Therefore,

χ (x_{k})

defined in this way is an optimality measure for the barrier problem (2.6).

The following lemma can be found by Wätcher and Biegler⁶ and indicates that the iterates generated by Algorithm I (for the barrier algorithm) are bounded away from the boundary of the region defined by the constraints $x \geq 0$ .

Lemma 3.1

Suppose Assumptions G hold. Then there exists a constant $ϵ_{x} > 0$ so that $x_{k} \geq ϵ_{x} r$ for all $k .$

Lemma 3.2

Suppose Assumptions G hold. Then there exist constants $M_{d}, M_{λ}, M_{m} > 0$ , such that $‖ s_{k} ‖ \leq M_{d}, ‖ λ_{k + 1} ‖ \leq M_{λ}, | \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} | \leq M_{m}$ for all $k \notin R_{inc},$ where $λ_{k + 1} = s_{k}^{λ} + λ_{k} .$

The proofs are similar to those of Lemma 1 by Wätcher and Biegler⁶ and are omitted.

Lemma 3.3

Suppose Assumptions G hold. If ${x_{k_{i}}}$ is a subsequence of iterates for which $χ (x_{k_{i}}) \geq ϵ$ with a constant $ϵ > 0$ independent of $i$ , then there exist constants $ϵ_{1}, ϵ_{2} > 0$ , such that

θ (x_{k_{i}}) \leq ϵ_{1} \Rightarrow \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} \leq - ϵ_{2}

(3.4)

for all

i .

Proof

Consider a subset $x_{k_{i}}$ of iterates with $χ (x_{k_{i}}) = ‖ {\bar{p}}_{k_{i}} ‖_{2} \geq ϵ$ . Then, by Assumption (G4), for all $x_{k_{i}}$ with $θ (x_{k_{i}}) \leq θ_{inc}$ we have $k_{i} \notin R_{inc}$ . Since (3.2a) implies that $q_{k_{i}} = O (‖ c (x_{k_{i}}) ‖),$ it follows that for $k_{i} \notin R_{inc}$

\begin{aligned} \nabla φ_{μ_{j}} (x_{k_{i}})^{T} s_{k_{i}} & = \nabla φ_{μ_{j}} (x_{k_{i}})^{T} V_{k_{i}} {\bar{p}}_{k_{i}} + \nabla φ_{μ_{j}} (x_{k_{i}})^{T} q_{k_{i}} (2.19) = \\ - {\bar{p}}_{k_{i}}^{T} [V_{k_{i}}^{T} (D_{k_{i}} H_{k_{i}} D_{k_{i}} + E_{k_{i}}) V_{k_{i}}] {\bar{p}}_{k_{i}} - {\bar{p}}_{k_{i}}^{T} V_{k_{i}}^{T} (D_{k_{i}} H_{k_{i}} D_{k_{i}} + E_{k_{i}}) q_{k_{i}} \\ + \nabla φ_{μ_{j}} (x_{k_{i}})^{T} q_{k_{i}} (G 3) \leq - c_{1} ‖ {\bar{p}}_{k_{i}} ‖_{2}^{2} + c_{2} ‖ {\bar{p}}_{k_{i}} ‖ ‖ c_{k_{i}} ‖ \\ + c_{3} ‖ c_{k_{i}} ‖ (3.3) \leq χ (x_{k_{i}}) (- c_{1} ϵ + c_{2} θ_{k_{i}} + \frac{c_{3} θ_{k_{i}}}{ϵ}) \end{aligned}

for some constants

c_{1}, c_{2}, c_{3} > 0.

If we now define

ϵ_{1} : = \min {θ_{inc}, \frac{ϵ^{2} c_{1}}{2 (c_{2} ϵ + c_{3})}}

, it follows for all

x_{k_{i}}

with

θ (x_{k_{i}}) \leq ϵ_{1}

that

\nabla φ_{μ_{j}} (x_{k})^{T} s_{k} \leq - \frac{ϵ c_{1}}{2} χ (x_{k_{i}}) \leq - \frac{ϵ^{2} c_{1}}{2} .

The claim follows after defining

ϵ_{2} : = \frac{ϵ^{2} c_{1}}{2}

. □

We can build on the previous lemmas and use the same arguments as in the proof of Theorem 2 by Wätcher and Biegler⁶ to prove the following global convergence result for the proposed method.

Theorem 3.1

Suppose Assumptions G hold. Then $lim_{k \to \infty} θ (x_{k}) = 0 and lim_{k \to \infty} inf χ (x_{k}) = 0$

The local convergence analysis of the algorithm is based on the necessary assumptions as follows:

Assumptions L. Assume that ${x_{k}}$ converges to a local solution $x_{*}$ of problem (1.1). Assume furthermore that the standard Newton’s method assumptions and the following hold. (L1)

In (2.19), $D_{k} H_{k} D_{k} + E_{k}$ is uniformly positive definite on the null space of $A_{k} D_{k},$ as well as bounded.

(L2)

lim_{k \to \infty} \frac{‖ (H_{k} - \nabla_{x x}^{2} l_{*}) s_{k} ‖}{‖ s_{k} ‖} = 0.

(3.5)

Define

W (x, λ) : = \nabla_{x x}^{2} L (x, λ) .

Since

D_{k} W_{k} D_{k} = D_{k} \nabla_{x x}^{2} L_{k} D_{k} = D_{k} (\nabla_{x x}^{2} l_{k} + μ_{j} D_{k}^{- 4}) D_{k} = D_{k} \nabla_{x x}^{2} l_{k} D_{k} + Z_{k} = D_{k} \nabla_{x x}^{2} l_{k} D_{k} + E_{k}

condition (3.5) could be written as follows:

lim_{k \to \infty} \frac{‖ (D_{k} H_{k} D_{k} + E_{k} - D_{*} W_{*} D_{*}) s_{k} ‖}{‖ s_{k} ‖} = 0

which mean

s_{k}^{T} (D_{k} H_{k} D_{k} + E_{k}) s_{k} = s_{k}^{T} D_{k} W_{k} D_{k} s_{k} + o (‖ s_{k} ‖^{2})

(3.6)

Lemma 3.4

If Assumptions L hold, then there exists a neighborhood $U_{1}$ of $x_{*}$ , so that for all $x_{k} \in U_{1}$ we have

‖ s_{k}^{soc} ‖ = O (‖ s_{k} ‖^{2})

(3.7)

‖ c (x_{k} + s_{k} + s_{k}^{soc}) ‖ = o (‖ s_{k} ‖^{2})

(3.8)

Proof

The proof is essentially identical to that of Lemma 4.1 by Wätcher and Biegler.⁵ $c (x_{k} + s_{k}) = c (x_{k}) + A_{k} s_{k} + O (‖ s_{k} ‖^{2}) = (‖ s_{k} ‖^{2})$ which implies (3.7) by (2.16). Since

\begin{aligned} c (x_{k} + s_{k} + s_{k}^{soc}) & = c (x_{k} + s_{k}) + A (x_{k} + s_{k}) s_{k}^{soc} + O (‖ s_{k}^{soc} ‖^{2}) \\ = - D_{k}^{- 1} (A_{k} D_{k}) s_{k}^{soc} + A (x_{k} + s_{k}) s_{k}^{soc} + O (‖ s_{k}^{soc} ‖^{2}) \\ = [A (x_{k} + s_{k}) - A_{k}] s_{k}^{soc} + O (‖ s_{k}^{soc} ‖^{2}) = o (‖ s_{k} ‖^{2}) \end{aligned}

for

x_{k}

close to

x_{*}

, where noting that

A (x)

is Lipschitz-continuous, the claim follows. □

Lemma 3.5

Suppose Assumptions L hold and the strict complementarity of problem (1.1) holds at the limit point $x_{*}$ of ${x_{k}}$ . Then there exists a neighborhood $U_{2} \subseteq U_{1}$ of $x_{*}$ so that whenever (2.20) holds for $α_{k, l} = 1$ , the Armijo condition (2.23) is satisfied.

Proof

Let the step size scalar $α_{k, l} = min {- τ_{j} \frac{x_{k}^{(i)}}{s_{k}^{(i)}} : s_{k}^{(i)} < 0, i = 1, \dots, n}$ along $s_{k}$ to the boundary of the inequality constraints $x_{k} (α_{k, l}) > 0$ . Because the strict complementarity of problem (1.1) holds at the limit point $x_{*}$ of ${x_{k}},$ we have that $| x_{*}^{(i)} | + | e_{*}^{(i)} | > 0, i = 1, \dots, m .$ Pick $ϵ = \frac{1}{2} (| x_{*}^{(i)} | + | e_{*}^{(i)} |) .$

If $i \notin I (x_{*}),$ then $| x_{*}^{(i)} | = 2 ϵ > 0.$ $x_{k} \to x_{*}$ and $x_{k} > 0$ then imply that both $s_{k} \to 0$ and $x_{k}^{(i)} = | x_{k}^{(i)} | \geq \frac{1}{2} | x_{*}^{(i)} | = ϵ$ hold. Since $τ_{j} \in (0, 1)$ is bounded, we can obtain from (2.13)

α_{k, l} = - \frac{τ_{j} x_{k}^{(t)}}{s_{k}^{(t)}} \geq \frac{τ_{j} ϵ}{‖ s_{k} ‖} \to \infty

i \in I (x_{*}),

then

| e_{*}^{(i)} | = 2 ϵ .

Since

(x_{k}, λ_{k}) \to (x_{*}, λ_{*}),

we have

| e_{k}^{(i)} | \geq \frac{1}{2} | e_{*}^{(i)} | = ϵ,

and

s_{k} \to 0, s_{k}^{λ} \to 0.

Since for sufficiently large

j,

τ_{j} \to 1

and

μ_{j} \to 0

, we deduce that, for sufficiently large

k,

α_{k, l} = - \frac{τ_{j} x_{k}^{(t)}}{s_{k}^{(t)}} = \frac{τ_{j}}{\frac{(\nabla_{x x}^{2} l_{k} s_{k})^{(t)}}{e_{k}^{(t)}} + \frac{(A_{k} s_{k}^{λ})^{(t)}}{e_{k}^{(t)}} + 1 - \frac{μ_{j}}{x_{k}^{(t)} e_{k}^{(t)}}} \to 1

From the above discussion, we have obtained that if the step size

α_{k, l}

is given in (2.13), then the step size will be determined in (2.20) and (2.21). Therefore, we will show that whenever (2.20) holds for

α_{k, l} = 1

, the Armijo condition (2.23) is satisfied.

First, we try to get the relationship between $p_{k}$ , $q_{k}$ and $s_{k} .$ Choose $U_{1}$ to be the neighborhood from Lemma 3.4. From the switching condition (2.20), we have

\begin{aligned} ‖ c (x_{k}) ‖ = θ (x_{k}) < δ^{- \frac{1}{s_{θ}}} [- \nabla φ_{μ_{j}} (x_{k})^{T} s_{k}]^{\frac{s_{φ}}{s_{θ}}} = o (‖ s_{k} ‖^{2}) \end{aligned}

(3.9)

since

s_{φ} > 2 s_{θ}

and

\nabla φ_{μ_{j}} (x_{k})

is uniformly bounded in

U_{1}

. From (3.2a) and (3.9), it is easy to verify that

q_{k} = O (‖ c_{k} ‖) = o (‖ s_{k} ‖^{2}) = o (p_{k}^{T} p_{k} + q_{k}^{T} q_{k}) = o (‖ {\bar{p}}_{k} ‖^{2}) + o (‖ q_{k} ‖^{2})

(3.10)

and therefore

q_{k} = o (‖ {\bar{p}}_{k} ‖^{2}) .

On the other hand, we can get

s_{k} = O (‖ p_{k} ‖) + O (‖ q_{k} ‖) = O (‖ {\bar{p}}_{k} ‖) + o (‖ {\bar{p}}_{k} ‖^{2}) .

Furthermore, we can easily get

\begin{aligned} s_{k} & = O (‖ {\bar{p}}_{k} ‖) \\ \nabla φ_{μ_{j}} (x_{k})^{T} s_{k}^{soc} & \overset{(2.16)}{=} - \nabla φ_{μ_{j}} (x_{k})^{T} D_{k} (A_{k} D_{k})^{+} c (x_{k} + s_{k}) \\ \overset{(2.19)}{=} [D_{k}^{- 2} (D_{k} H_{k} D_{k} + E_{k}) s_{k} + A_{k}^{T} λ_{k + 1}]^{T} \\ D_{k} (A_{k} D_{k})^{+} c (x_{k} + s_{k}) \end{aligned}

(3.11)

= \frac{1}{2} s_{k}^{T} \sum_{i = 1}^{m} λ_{k + 1}^{(i)} \nabla^{2} c_{k}^{(i)} s_{k} + o (‖ s_{k} ‖^{2})

(3.12)

and

\begin{aligned} φ_{μ_{j}} (x_{k}) - φ_{μ_{j}} (x_{k} + s_{k} + s_{k}^{soc}) \\ = - \nabla φ_{μ_{j}} (x_{k})^{T} (s_{k} + s_{k}^{soc}) - \frac{1}{2} (s_{k} + s_{k}^{soc})^{T} \nabla_{x x}^{2} φ_{μ_{j}} (x_{k})^{T} \\ (s_{k} + s_{k}^{soc}) + o (‖ s_{k} ‖^{2}) \\ = - \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} - \frac{1}{2} s_{k}^{T} \sum_{i = 1}^{m} λ_{k + 1}^{(i)} \nabla^{2} c_{k}^{(i)} s_{k} \\ - \frac{1}{2} s_{k}^{T} \nabla_{x x}^{2} φ_{μ_{j}} (x_{k})^{T} s_{k} + o (‖ s_{k} ‖^{2}) \\ = - \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} - \frac{1}{2} s_{k}^{T} W_{k} s_{k} + o (‖ s_{k} ‖^{2}) \end{aligned}

(3.13)

simply that

\begin{aligned} φ_{μ_{j}} (x_{k} + s_{k} + s_{k}^{soc}) - φ_{μ_{j}} (x_{k}) - η \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} \\ \overset{(3.13)}{=} (1 - η) \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} + \frac{1}{2} s_{k}^{T} W_{k} s_{k} + o (‖ s_{k} ‖^{2}) \\ \overset{(2.9)}{=} (η - 1) s_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} s_{k} \\ + (1 - η) λ_{k + 1}^{T} c_{k} + \frac{1}{2} s_{k}^{T} W_{k} s_{k} + o (‖ s_{k} ‖^{2}) \\ \overset{(3.6)}{=} (η - \frac{1}{2}) s_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} s_{k} + o (‖ s_{k} ‖^{2}) \\ \overset{(3.11)}{=} (η - \frac{1}{2}) {\bar{p}}_{k}^{T} V_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} V_{k} {\bar{p}}_{k} \\ + o (‖ {\bar{p}}_{k} ‖^{2}) \end{aligned}

where

λ_{k + 1} = (λ_{k + 1}^{(1)}, \dots, λ_{k + 1}^{(m)}) .

Since

{\bar{p}}_{k} \to 0

x_{k} \to x_{*}

, the proposed algorithm implies the Armijo condition (2.23) with

α_{k, l} = 1

because of Assumption (L1) and

η < \frac{1}{2}

, if

x_{k}

is sufficiently close to

x_{*}

. □

In the following, we employ the exact penalty function $ϕ_{ρ} (x) = φ_{μ_{j}} (x) + ρ θ (x)$ and the model of the penalty function $q_{ρ} (x_{k}, s) = φ_{μ_{j}} (x_{k}) + \nabla φ_{μ_{j}} (x_{k})^{T} s + \frac{1}{2} s^{T} W_{k} s + ρ ‖ A_{k}^{T} s + c_{k} ‖_{1}$ to prove the local convergence results and to regard the effect of second-order correction steps. The algorithm never refers to them. Lemma 3.6

Suppose Assumptions L hold. Then there exists a neighborhood $U_{3} \subseteq U_{2}$ and a constant $ρ \geq ‖ λ_{k + 1} ‖_{\infty}$ , so that for all $x_{k} \in U_{3}$ we have $ϕ_{ρ} (x_{k}) - ϕ_{ρ} (x_{k} + s_{k} + s_{k}^{soc}) \geq \frac{1 + γ_{θ}}{2} (q_{ρ} (x_{k}, 0) - q_{ρ} (x_{k}, s_{k})) \geq 0.$

Proof

Since

\begin{aligned} q_{ρ} (x_{k}, 0) - q_{ρ} (x_{k}, s_{k}) = ρ ‖ c_{k} ‖_{1} - \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} - \frac{1}{2} s_{k}^{T} W_{k} s_{k} \\ \overset{(2.9)}{=} ρ ‖ c_{k} ‖_{1} + s_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} s_{k} - λ_{k + 1}^{T} \\ c_{k} - \frac{1}{2} s_{k}^{T} W_{k} s_{k} \\ = ρ ‖ c_{k} ‖_{1} - λ_{k + 1}^{T} c_{k} + \frac{1}{2} s_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} s_{k} \\ + \frac{1}{2} s_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k} - D_{k} W_{k} D_{k}) D_{k}^{- 1} s_{k} \\ \geq (ρ - ‖ λ_{k + 1} ‖_{\infty}) ‖ c_{k} ‖_{1} + \frac{1}{2} s_{k}^{T} D_{k}^{- 1} (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} s_{k} \\ + o (‖ s_{k} ‖^{2}) \end{aligned}

choosing

ρ \geq ‖ λ_{k + 1} ‖_{\infty},

we obtain

q_{ρ} (x_{k}, 0) - q_{ρ} (x_{k}, s_{k}) \geq 0,

which follows the required latter inequality. Further, we have that

\begin{aligned} ϕ_{ρ} (x_{k}) - ϕ_{ρ} (x_{k} + s_{k} + s_{k}^{soc}) - \frac{1 + γ_{θ}}{2} (q_{ρ} (x_{k}, 0) - q_{ρ} (x_{k}, s_{k})) \\ = ρ θ_{k} - \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} - \frac{1}{2} s_{k}^{T} W_{k} s_{k} \\ - \frac{1 + γ_{θ}}{2} (ρ ‖ c_{k} ‖_{1} - \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} - \frac{1}{2} s_{k}^{T} W_{k} s_{k}) + o (‖ s_{k} ‖^{2}) \\ = \frac{γ_{θ} - 1}{2} \nabla φ_{μ_{j}} (x_{k})^{T} s_{k} + \frac{γ_{θ} - 1}{4} s_{k}^{T} W_{k} s_{k} \\ + \frac{1 - γ_{θ}}{2} ρ ‖ c_{k} ‖_{1} + o (‖ s_{k} ‖^{2}) \\ \geq \frac{1 - γ_{θ}}{2} (ρ - ‖ λ_{k + 1} ‖_{\infty}) ‖ c_{k} ‖_{1} - \frac{γ_{θ} - 1}{4} {\bar{p}}_{k}^{T} V_{k}^{T} D_{k}^{- 1} \\ (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} V_{k} {\bar{p}}_{k} \\ + o (‖ {\bar{p}}_{k} ‖^{2}) + o (‖ s_{k} ‖^{2}) \\ = \frac{1 - γ_{θ}}{2} (ρ - ‖ λ_{k + 1} ‖_{\infty}) ‖ c_{k} ‖_{1} + \frac{1 - γ_{θ}}{4} {\bar{p}}_{k}^{T} V_{k}^{T} D_{k}^{- 1} \\ (D_{k} H_{k} D_{k} + E_{k}) D_{k}^{- 1} V_{k} {\bar{p}}_{k} + o (‖ {\bar{p}}_{k} ‖^{2}) \end{aligned}

The last inequity is induced by using

γ_{θ} < 1

and Assumption (L1). Then defining

ρ \geq ‖ λ_{k + 1} ‖_{\infty}

, we get our desired results. □

For the necessary of showing the main local convergence theorem, we define constants $ρ_{1}, ρ_{2}, ρ_{3}$ with: $ρ_{1} = ρ > ‖ λ_{k + 1} ‖_{\infty}, ρ_{2} = \frac{1 + γ_{θ}}{1 - γ_{θ}} ρ_{1} + \frac{3 γ_{f}}{1 - γ_{θ}}, ρ_{3} = (1 - γ_{θ}) ρ_{2} - γ_{f} .$

Lemma 3.7

Suppose Assumptions L hold. Then there exists a neighborhood $U_{3} \subseteq U_{2}$ and constants $ρ_{1}, ρ_{2}, ρ_{3} > 0$ , so that for all $x_{k} \in U_{3}$ , we have

2 γ_{θ} ρ_{2} < (1 + γ_{θ}) (ρ_{2} - ρ_{1}) - 2 γ_{f}

(3.14\rm a)

2 ρ_{3} \geq (1 + γ_{θ}) ρ_{1} + (1 - γ_{θ}) ρ_{2}

(3.14\rm b)

ρ_{i} > ‖ λ_{k + 1} ‖_{\infty}, for i = 1, 2, 3

(3.14\rm c)

Proof.

Using the definitions of $ρ_{i}, i = 1, 2, 3$ , we get

\begin{aligned} (1 + γ_{θ}) (ρ_{2} - ρ_{1}) - 2 γ_{f} - 2 γ_{θ} ρ_{2} \\ = (1 - γ_{θ}) ρ_{2} - (1 + γ_{θ}) ρ_{1} - 2 γ_{f} \\ = (1 + γ_{θ}) ρ_{1} + 3 γ_{f} - (1 + γ_{θ}) ρ_{1} - 2 γ_{f} = γ_{f} > 0. \end{aligned}

It follows that

2 γ_{θ} ρ_{2} < (1 + γ_{θ}) (ρ_{2} - ρ_{1}) - 2 γ_{f} .

From

\begin{aligned} (1 + γ_{θ}) ρ_{1} + (1 - γ_{θ}) ρ_{2} - 2 ρ_{3} & = (1 + γ_{θ}) ρ_{1} + (1 - γ_{θ}) ρ_{2} \\ - 2 (1 - γ_{θ}) ρ_{2} + 2 γ_{f} \\ = (1 + γ_{θ}) ρ_{1} - (1 + γ_{θ}) ρ_{1} \\ - 3 γ_{f} + 2 γ_{f} \\ = - γ_{f} < 0, \end{aligned}

the claim (3.14b) follows. Note that

ρ_{2} - ρ_{1} = \frac{2 γ_{θ} ρ_{1}}{1 - γ_{θ}} ρ_{1} + \frac{3 γ_{f}}{1 - γ_{θ}}

and

ρ_{3} - ρ_{1} = γ_{θ} ρ_{1} + 2 γ_{f} > 0

. Hence, we have

ρ_{2}, ρ_{3} > ρ_{1}

. Combine the definition of

ρ_{1}

to get (3.14c). □

The following result corresponds to the result that Wätcher and Biegler ⁵ obtained for the local version of the line search filter methods. The proof is similar and is omitted.

Theorem 3.2

Suppose Assumptions L hold. Then, for $k$ sufficiently large, full steps of the form $x_{k + 1} = x_{k} + s_{k}$ or $x_{k + 1} = x_{k} + s_{k} + s_{k}^{soc}$ are taken.

Theorem 3.3

Suppose Assumptions L hold. Then, for $k$ sufficiently large, ${x_{k}}$ converges to $x_{*}$ superlinearly. Furthermore, if $H_{k} = \nabla_{x x}^{2} l_{k},$ then ${x_{k}}$ converges to $x_{*}$ quadratically.

Proof

According to Theorem 3.2, for $k$ sufficiently large, full steps of the form $x_{k + 1} = x_{k} + s_{k}$ or $x_{k + 1} = x_{k} + s_{k} + s_{k}^{soc}$ are taken in Algorithm I. We start by showing the statements being true for $x_{k + 1} = x_{k} + s_{k} .$

For the analysis below, it is convenient to use the following notations: $w_{k} = (x_{k}, λ_{k})^{T},$ $Δ w_{k} = (s_{k}, s_{k}^{λ})^{T}, w_{k + 1} = w_{k} + Δ w_{k},$ $S_{k} = (\begin{matrix} D_{k} H_{k} D_{k} + E_{k} & D_{k} A_{k}^{T} \\ A_{k} D_{k} & 0 \end{matrix})$ and

$T_{k} = (\begin{matrix} D_{k} (\nabla φ_{μ_{j}} (x_{k}) + A_{k}^{T} λ_{k}) \\ c_{k} \end{matrix}) .$ Hence, $\nabla T_{k} = (\begin{matrix} D_{k} \nabla_{x x}^{2} l_{k} D_{k} + E_{k} & D_{k} A_{k}^{T} \\ A_{k} D_{k} & 0 \end{matrix}) .$ Since

{‖ (S_{k} - \nabla T_{*}) (w_{k + 1} - w_{k}) ‖}^{2}

we can obtain

lim_{k \to \infty} \frac{‖ (S_{k} - \nabla T_{*}) (w_{k + 1} - w_{k}) ‖}{‖ w_{k + 1} - w_{k} ‖} = 0.

Applying Theorem 3 by Yamashita and Yabe,⁹ one can show that the sequence

{w_{k}}

converges superlinearly to

w_{*},

that is,

lim_{k \to \infty} \frac{‖ w_{k + 1} - w_{*} ‖}{‖ w_{k} - w_{*} ‖} = 0,

which indicates that

lim_{k \to \infty} \frac{‖ x_{k + 1} - x_{*} ‖}{‖ x_{k} - x_{*} ‖} = lim_{k \to \infty} \frac{‖ x_{k} + s_{k} - x_{*} ‖}{‖ x_{k} - x_{*} ‖} = 0

(3.15)

We now consider the second case that full steps of the form

x_{k + 1} = x_{k} + s_{k} + s_{k}^{soc}

are taken. Since for all large

k,

we have

‖ s_{k} ‖ = ‖ D_{k} {\hat{s}}_{k} ‖ = ‖ D_{k} (A_{k} D_{k})^{- 1} c_{k} ‖ \leq K_{1} ‖ x_{k} - x_{*} ‖,

for some positive constant

K_{1} .

Therefore, (3.7) yields

‖ s_{k}^{soc} ‖ = O (‖ s_{k} ‖^{2}) = O (‖ x_{k} - x_{*} ‖^{2}) .

Further, we get

lim_{k \to \infty} \frac{‖ s_{k}^{soc} ‖}{‖ x_{k} - x_{*} ‖} = 0.

Hence, (3.15) implies that:

lim_{k \to \infty} \frac{‖ x_{k} + s_{k} + s_{k}^{soc} - x_{*} ‖}{‖ x_{k} - x_{*} ‖} \leq lim_{k \to \infty} \frac{‖ x_{k} + s_{k} - x_{*} ‖}{‖ x_{k} - x_{*} ‖} + lim_{k \to \infty} \frac{‖ s_{k}^{soc} ‖}{‖ x_{k} - x_{*} ‖} = 0

H_{k} = \nabla_{x x}^{2} l_{k},

we can get

S_{k} = \nabla T_{k},

and further

\begin{aligned} ‖ w_{k + 1} - w_{*} ‖ & = ‖ w_{k} - w_{*} + Δ w_{k} ‖ = ‖ w_{k} - w_{*} - \nabla T_{k}^{- 1} T_{k} ‖ \\ \leq ‖ \nabla T_{k}^{- 1} ‖ ‖ T_{k} - T_{*} - \nabla T_{k} (w_{k} - w_{*}) ‖ \\ = O (‖ w_{k} - w_{*} ‖^{2}) \end{aligned}

Hence, we can have

‖ x_{k + 1} - x_{*} ‖ = ‖ x_{k} + s_{k} - x_{*} ‖ \leq O (‖ x_{k} - x_{*} ‖^{2}),

which implies that

‖ x_{k} + s_{k} + s_{k}^{soc} - x_{*} ‖ \leq ‖ x_{k} + s_{k} - x_{*} ‖ + ‖ s_{k}^{soc} ‖ \leq O (‖ x_{k} - x_{*} ‖^{2})

□

Footnotes

Declaration of conflicting interests

The author(s) declare no potential conflicts of interest with respect to the research, authorship, and publication of this article.

Funding

This work was supported by Hunan Provincial Education Science “14-th Five-Year Plan” Annual Project (no. XJK21BGD032-ND214029).

ORCID iD

Zhujun Wang

References

Arzani

Peyghami

. An approach based on dwindling filter method for positive definite generalized eigenvalue problem. Comp Appl Math 2018; 37(2): 1197–1212.

Eichfelder

Klamroth

Niebling

. Nonconvex constrained optimization by a filtering branch and bound. J Global Optim 2021; 80: 31–61.

Zhu

. An affine-scaling interior trust-region method combining with line search filter technique for optimization subject to bounds on variables. Numer Algor 2018; 77(4): 1159–1182.

Ulbrich

Vicente

. A globally convergent primal-dual interior-point filter method for nonconvex nonlinear programming. Math Program., Series B 2004; 100: 217–245.

Wächter

Biegler

. Line search filter methods for nonlinear programming: Local convergence. SIAM J Optim 2005; 16(1): 32–48.

Wächter

Biegler

. Line search filter methods for nonlinear programming: Motivation and global convergence. SIAM J Optim 2005; 16(1): 1–31.

Wächter

Biegler

. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Program 2006; 106: 22–57.

Vicente

. On interior-point newton algorithms for discretized optimal control problems with state constraints. Optim Methods Softw 1998; 8(3): 249–275.

Yamashita

Yabe

. Superlinear and quadratic convergence of some primal-dual interior-point methods for constrained optimization. Math Program 1996; 75: 377–397.

An affine-scaling interior-point filter line-search algorithm for constrained optimization

Abstract

Keywords

Introduction

The basic algorithm

The affine-scaling interior-point approach

The filter line-search method and second-order correction steps

The basic algorithm

Convergence analysis

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References