We present a family of new inexact secant methods in association with Armijo line search technique for solving nonconvex constrained optimization. Different from the existing inexact secant methods, the algorithms proposed in this paper need not compute exact directions. By adopting the nonsmooth exact penalty function as the merit function, the global convergence of the proposed algorithms is established under some reasonable conditions. Some numerical results indicate that the proposed algorithms are both feasible and effective.
We assume that the nonlinear equality constrained problem to be solved is stated as
where and are smooth and possibly nonconvex functions.
The reduced Hessian methods in successive quadratic programming (SQP)1,2 and secant methods (two-step algorithms) as defined in Fontecilla3 are two of the most successful methods for solving problem (1). Compared with the widely reduced Hessian methods in which the orthonormal basis for the tangent space of the constraints at the current point xk changes continuously with k,4,5 the secant methods have a main advantage which rests in the use of an orthogonal projection operator which is continuous.
Two basic approaches, namely the line search and trust region, have been developed in order to ensure global convergence towards local minima. In Byrd et al.,6 an algorithm, which is based on a characterization of inexact sequential quadratic programming (SQP) steps that can ensure global convergence, has been presented for large-scale equality constrained optimization. An exact penalty function was used to determine if a given inexact step makes sufficient progress toward a solution of the nonlinear program. The inexact Newton method was initially proposed by Dembo et al.7 to solve large systems of nonlinear equations. Dembo and Steihaug8 used that procedure to solve unconstrained minimization problems. Currently, Byrd et al.9 proposed an inexact Newton method for nonconvex equality constrained optimization. The method in Byrd et al.9 was allowed for the presence of negative curvature without requiring information about the inertia of the primal-dual iteration matrix for nonconvex problems. Then, an inexact Newton method with a filter line search algorithm is proposed in Wang et al.10 for the same problem. In Gu and Zhu11 and Wang et al.,12 inexact secant algorithms are proposed for solving the large-scale nonlinear systems of equalities and inequalities and nonlinear constrained convex optimization, respectively.
In this paper, we propose a class of new inexact secant methods with Armijo line search for nonconvex problems. The methods are globalized by line search technique and adopting l1 penalty function as a merit function. The paper is organized as follows. In the next section, these algorithms are developed. Then, we analyze the global convergence properties. The results of numerical experience with these methods are discussed in Numerical results section. Final remarks are provided in the Conclusions section.
The proposed algorithms
Throughout the paper, we denote the Euclidean vector or matrix norm by , l1 norm by , the gradient by g(x). l is the Lagrangian function defined for and by
The gradient of the Lagrangian function is denoted . We will be using the BFGS secant update defined by
where and . We will write
The following is the basic idea about secant methods: At the th each iteration
where is the pseudo-inverse of the gradient . The projection onto the null space of can be either the orthogonal projection P(x) given by
or the oblique projection defined by
The multiplier updates in equation (2a) can be chosen from one of the following updates
Projection update:
Null-space update:
Newton update:
The pseudo-inverse of will be either
or
While choosing different projection operators in equation (3), different multiplier updates in equation (4), and different pseudo-inverse in equation (5), we can obtain six algorithms which will be listed in Table 1.
Algorithms.
Algorithm
P(x)
Algorithm
P(x)
1
(4b)
I
(5b)
2
(4a)
(3b)
(5b)
3
(4c)
(3a)
(5a)
4
(4b)
I
(5a)
5
(4a)
(3b)
(5a)
6
(4c)
(3a)
(5b)
Next, we outline the algorithm and globalization strategy. An integral part of the approach is the mechanism used to determine if a trial step d is acceptable during a given iteration. We commonly use a merit function to determine whether a step is acceptable. The nondifferentiable l1 merit function and the Fletcher’s exact and differentiable function are representative of most of merit function used in practice. The evaluation of the l1 merit function, when compared to the Fletcher’s augmented Lagrangian merit function, is very inexpensive. One of the former’s potential drawbacks is that it can suffer from the Maratos effect. Several strategies have been used to remove the damaging effects of the Maratos effect. Therefore, we adopt the l1 penalty function as the merit function
where is known as the penalty parameter. If ω is greater than a certain threshold, then a first-order optimal point of equation (1) is a stationary point of . Although is not continuously differentiable, the directional derivative of in a direction d, denoted by , is nonnegative at for all .
In this paper, we perform a backtracking line search to compute a step length coefficient αk satisfying the Armijo condition
where the constant . Considering a local model of the merit function around the current iterate xk, the approximation has the form
Hence, we can estimate the reduction in the merit function by evaluating
Once an acceptable step is obtained, we must ensure that a positive αk can be calculated to satisfy the Armijo condition. We will consider this issue in Lemma 1.
For nonconvex equality constrained optimization, that is
using the method proposed in Byrd et al.,9 we have the reduction condition
where . The tangential component uk may not be available explicitly, and so we are not able to compute the norm of this tangential component directly. This measure can be approximated by , but in general we desire satisfying
that is as close to as possible so that equation (8) is not overly restrictive.
In this paper, we will compute the search direction inexactly, which means that equation (2b) will be substituted by
where is the residual vector.
The model reduction condition in this work is
If the model reduction condition is satisfied for the most recent value of the penalty parameter, we have the option of increasing the penalty parameter in order to satisfy the model reduction condition. This, however, should only be done in two circumstances. If , it is safe to assume that the problem is sufficiently convex and so dk should be accepted. If , we only consider increasing ω if vk is a significant component of dk. We can express this condition as for some , where is any lower bound for the squared norm of the normal step component. If none of the above conditions can be satisfied for the model reduction condition, we modify Wk. (It is assumed that after a finite number of modifications we have .) One technique is to modify Wk to increase some or all of its eigenvalues so that the resulting matrix is closer to being positive definite.
We now formally state the global inexact secant methods for solving equation (1).
Algorithms
s1. Initialize. Choose a starting point x0. Set constants and . Give λ0 and , and let .
s2. If , then stop.
s3. Evaluate fk, gk, ck, Ak and Wk from equation (2f). Then compute the multiplier from (4).
s4. Find an inexact step dk from
satisfying Conditions I or II, where Pk is given by equation (3) and is given by equation (5).
Condition I
where and .
Condition II
where . If Conditons I or II cannot be satisfied, then modify Wk by increasing some or all of its eigenvalues.
s5. If Condition II is satisfied and
for , where , then set
s6. Choose until the following inequality is satisfied
s7. Set and . Go to s2.
Lemma 1. The directional derivative of the merit function along a step dk satisfies
Proof. From equation (3) and equation (10), we know that . Using Taylor’s theorem, we have that
Dividing both sides by α and taking the limit as yields the result.
It is easy to prove that, if dk satisfies the following model reduction condition
where and , then the directional derivative of the merit function satisfies , which ensures the model reduces.
Global convergence
In this section, we will establish the global convergence of the proposed algorithms under the following assumptions.
Assumptions G. The sequences generated by Algorithms are contained in a convex set Ω and the following properties hold:
G1. f and c are bounded and twice continuously differentiable on Ω.
G2. The sequences and are bounded.
G3. The sequence is bounded.
G4. have full row rank and their smallest singular values are bounded below by some positive constant.
Lemma 2. The sequence vk given by equation (2d) is bounded and satisfies that
for some .
Proof. From equations (2d), (5a) and (5b), we have that
or
Assumptions G1, G2 and G4 imply that and are bounded, then from equations (15) and (16), the result follows. □
Lemma 3. Suppose Assumptions G hold, the sequence of parameters ωk is bounded above and for some , when .
Proof. In our algorithm, ωk is increased only if Condition II is satisfied. It means that
From the above lemma, we can easily prove that the sequence is bounded below and away from zero (see Lemma 5 in Byrd et al.9). We now state the main result as follows.
Theorem 1. Let be a sequence generated by the proposed algorithm. Suppose Assumptions G hold, then
Proof. By equation (13) and Lemma 4, there exists a such that for all
This, along with the fact that is bounded below and Assumption G1, implies
Using a way similar to Lemma 4.1 and Lemma 4.2 in Gu and Zhu,11 we can get
where χk can have the values 1, –1 or 0. This, along with equation (37) and Assumption G2, implies that the theorem is true. □
Numerical results
In this section, we report some numerical experiments of the proposed algorithms, which were implemented in MATLAB 7.0. The numerical results have been obtained by running our algorithms on the set of 44 equality constrained problems, which can be found on the web page:http://www.gamsworld.org/performance/princetonlib/htm/group5stat.htm.
In each case, the starting point supplied with the problem was used. All attempts to solve the test problems were limited to a maximum of 1000 iterations or half an hour of CPU time. We set the parameters in the algorithm as follows: and which seemed to work reasonably well for a broad class of problems. The Jacobian matrices and the Hessian matrices were approximated by finite differences.
In Table 2, we present the names of the 44 problems with their number of variables (n), number of equality constraints (m) and best-known objective . Tables 2 to 5 list the numerical results of the problems by using Algorithms 1–6. In the tables, iter, nf, nc and ng stand for the number of iterations, the number of evaluating function f, the number of calculations of c and the number of gradient evaluation of f, respectively. The CPU times (seconds) were denoted by cput. A “-” means that the algorithm failed. A “(*)” means that the problem was strictly convex over the set of computed iterates. In every algorithm, the number of gradient evaluations of f, ng, is equal to that of gradient evaluations of c. Then, the number of gradient evaluation of c is not listed.
Basic data of the test functions.
Name
n
m
Name
n
m
bt2
3
1
0.0325
bt3
5
3
4.0930
bt4
3
2
−45.5105
bt5
3
2
961.7151
bt6
5
2
0.2770
bt7
5
3
306.1999
bt9
4
2
−1.0000
bt10
2
2
−1.0000
bt11
5
3
0.8248
bt12
5
3
6.1888
catena
32
11
−23077.7462
dtoc1nd
735
490
12.5277
eigena2
110
55
0.0000
eigenaco
110
55
0.0000
eigenb2
110
55
18.0000
eigenbco
110
55
9.0000
eigenc2
462
231
0.0000
eigencco
30
15
0.0000
fccu
19
8
11.1491
genhs28
10
8
0.9271
gilbert
1000
1
482.0272
hs006
2
1
0.0000
hs007
2
1
−1.7320
hs008
2
2
−1.0000
hs026
3
1
0.0000
hs027
3
1
0.0400
hs028
3
1
0.0000
hs039
4
2
−1.0000
hs040
4
3
−0.2500
hs046
5
2
0.0210
hs047
5
3
0.0000
hs048
5
2
0.0000
hs049
5
2
0.0000
hs050
5
3
0.0000
hs051
5
3
0.0000
hs052
5
3
5.3266
hs077
5
2
0.2415
hs078
5
3
−2.9197
hs079
5
3
0.0787
hs100lnp
7
2
680.6300
hs111lnp
10
3
−47.7614
maratos
2
1
−1.0000
mwright
5
3
24.9788
orthregb
27
6
0.0000
Numerical comparisons.
Name
Algorithm 1
Algorithm 2
iter
nf
ng
nc
cput
iter
nf
ng
nc
cput
bt2(*)
19
22
39
41
0.464
33
44
67
77
0.059
bt3(*)
14
30
29
44
0.088
24
48
49
72
0.150
bt4
17
39
35
56
0.056
27
65
55
92
0.072
bt5
7
8
15
15
0.700
9
12
19
21
0.057
bt6(*)
30
61
61
91
0.135
28
45
57
73
0.232
bt7
65
383
131
448
0.345
–
–
–
–
–
bt9
14
26
29
40
0.056
14
25
29
39
0.051
bt10(*)
6
6
13
12
0.159
2
2
5
4
0.006
bt11(*)
12
18
25
30
0.208
12
16
25
28
0.087
bt12(*)
12
20
25
32
0.205
28
77
57
105
0.194
catena
23
35
44
48
2.608
31
39
47
55
2.887
dtoc1nd
24
29
35
35
3.007
24
30
33
37
3.153
eigena2
26
95
55
122
1660.4
26
98
57
126
1780.4
eigenaco
2
4
5
6
121.399
2
4
5
6
114.150
eigenb2
3
5
7
8
126.209
3
5
7
8
133.324
eigenbco
2
3
5
5
119.773
2
3
5
5
112.864
eigenc2
3
6
7
7
134.112
3
6
7
7
129.316
eigencco
18
29
37
47
2.335
–
–
–
–
–
fccu(*)
22
52
45
74
1.527
21
48
43
69
1.285
genhs28(*)
6
7
13
13
0.222
8
9
17
17
0.398
gilbert
2
5
6
8
120.231
2
5
6
8
118.883
hs006
19
71
39
90
0.061
–
–
–
–
–
hs007
15
51
31
66
0.105
10
20
21
30
0.029
hs008(*)
2
3
5
5
0.130
2
3
5
5
0.005
hs026
203
269
407
472
0.444
127
625
255
752
0.275
hs027(*)
206
291
412
311
69.903
–
–
–
–
–
hs028(*)
8
10
17
18
0.0292
9
11
19
20
0.0312
hs039
14
26
29
40
0.084
14
25
29
39
0.083
hs040(*)
5
5
11
10
0.046
5
5
11
10
0.042
hs046(*)
62
497
124
559
3.117
–
–
–
–
–
hs047
39
281
78
320
2.323
12
56
24
69
0.732
hs048(*)
8
17
17
25
0.047
9
18
19
27
0.047
hs049(*)
31
42
63
73
0.249
31
42
63
73
0.157
hs050(*)
15
26
31
41
0.101
15
26
31
41
0.097
hs051(*)
9
19
19
28
0.063
10
20
21
30
0.066
hs052(*)
14
37
29
51
0.098
11
29
23
40
0.075
hs077(*)
24
32
49
56
0.115
21
24
43
45
0.100
hs078(*)
6
7
13
13
0.540
6
7
13
13
0.107
hs079(*)
12
17
25
29
0.083
16
23
33
39
0.095
hs100lnp
51
22
103
133
0.407
57
89
115
146
0.440
hs111lnp
96
256
193
352
2.083
223
844
447
1067
4.870
maratos(*)
4
4
9
8
6.705
4
4
9
8
3.100
mwright(*)
21
37
43
58
139.119
24
77
49
101
158.353
orthregb
12
13
25
25
1.046
11
11
23
22
0.834
Numerical comparisons (continued).
Name
Algorithm 3
Algorithm 4
iter
nf
ng
nc
cput
iter
nf
ng
nc
cput
bt2(*)
19
22
39
41
0.464
33
44
67
77
0.059
bt3(*)
14
30
29
44
0.088
24
48
49
72
0.150
bt4
17
39
35
56
0.056
27
65
55
92
0.072
bt5
10
13
21
23
0.040
8
8
17
16
0.034
bt6(*)
26
42
53
68
0.113
29
58
59
87
0.128
bt7
58
270
117
328
0.305
32
118
65
150
0.167
bt9
14
21
29
35
0.053
15
29
31
44
0.055
bt10(*)
4
4
9
8
0.010
7
7
15
14
0.016
bt11(*)
12
18
25
30
0.087
12
18
25
30
0.088
bt12(*)
40
135
81
175
0.272
15
26
31
41
0.106
catena
28
37
54
56
3.038
33
40
57
58
2.529
dtoc1nd
23
33
42
46
3.416
25
42
47
47
3.603
eigena2
27
95
55
122
1574.0
26
98
57
126
1683.6
eigenaco
2
4
5
6
114.128
2
4
5
6
117.059
eigenb2
3
5
7
8
154.210
3
5
7
8
143.225
eigenbco
2
3
5
5
112.406
2
3
5
5
118.161
eigenc2
3
6
7
7
122.603
3
6
7
7
135.140
eigencco
16
26
33
42
2.096
18
31
37
49
2.373
fccu(*)
22
49
45
71
1.341
19
44
39
63
1.166
genhs28(*)
8
9
17
17
0.276
8
9
17
17
0.275
gilbert
2
5
6
8
134.221
2
5
6
8
127.802
hs006
–
–
–
–
–
19
70
39
89
0.048
hs007
10
23
21
33
0.030
9
23
19
32
0.024
hs008(*)
3
4
7
7
0.007
2
3
5
5
0.05
hs026
44
137
89
181
0.095
157
924
315
1081
0.339
hs027(*)
–
–
–
–
–
–
–
–
–
–
hs028(*)
9
11
19
20
0.030
8
10
17
18
0.027
hs039
14
21
29
35
0.081
15
29
31
44
0.090
hs040(*)
5
5
11
10
0.044
5
5
11
10
0.041
hs046(*)
61
515
123
577
2.961
180
188
360
206
8.738
hs047
89
89
179
478
0.524
–
–
–
–
–
hs048(*)
9
18
19
27
0.046
8
17
17
25
0.043
hs049(*)
31
42
63
73
0.154
31
42
63
73
0.148
hs050(*)
15
26
31
41
0.097
15
26
31
41
0.094
hs051(*)
10
20
21
30
0.066
11
23
23
34
0.072
hs052(*)
11
29
23
40
0.074
10
25
21
35
0.069
hs077(*)
23
32
47
55
0.109
23
28
47
51
0.106
hs078(*)
6
7
13
13
0.040
6
7
13
13
0.041
hs079(*)
15
21
31
36
0.098
13
16
27
29
0.081
hs100lnp
41
78
83
119
0.320
76
248
553
256
21.461
hs111lnp
86
201
173
287
1.855
106
306
213
412
2.279
maratos(*)
3
3
7
6
2.414
4
4
9
8
3.097
mwright(*)
19
51
39
70
142.905
12
28
25
40
93.680
orthregb
13
15
27
28
0.982
12
13
25
25
0.897
Numerical comparisons (continued).
Name
Algorithm 5
Algorithm 6
iter
nf
ng
nc
cput
iter
nf
ng
nc
cput
bt2(*)
19
22
39
41
0.464
33
44
67
77
0.059
bt3(*)
14
30
29
44
0.088
24
48
49
72
0.150
bt4
17
39
35
56
0.056
27
65
55
92
0.072
bt5
8
9
17
17
0.035
10
14
21
24
0.042
bt6(*)
37
76
75
113
0.159
29
54
59
83
0.128
bt7
–
–
–
–
–
27
22
55
109
0.141
bt9
15
27
31
42
0.058
12
15
25
27
0.044
bt10(*)
2
2
5
4
0.005
4
4
9
8
0.010
bt11(*)
14
20
29
34
0.101
11
15
23
26
0.081
bt12(*)
33
96
67
129
0.227
18
34
37
52
0.125
catena
31
37
46
49
2.317
33
41
45
53
3.004
dtoc1nd
25
28
54
65
2.815
26
38
54
67
3.744
eigena2
27
95
55
122
1576.2
27
95
55
122
1486.5
eigenaco
2
4
5
6
118.047
2
4
5
6
116.598
eigenb2
3
5
7
8
118.710
3
5
7
8
121.337
eigenbco
2
3
5
5
113.767
2
3
5
5
115.366
eigenc2
3
6
7
7
125.706
3
6
7
7
119.482
eigencco
–
–
–
–
–
16
24
33
40
2.034
fccu(*)
20
47
41
67
1.228
21
47
43
68
1.294
genhs28(*)
8
9
17
17
0.278
8
9
17
17
0.277
gilbert
2
5
6
8
131.025
2
5
6
8
129.304
hs006
–
–
–
–
–
–
–
–
–
–
hs007
10
22
21
32
0.030
12
28
25
40
0.032
hs008(*)
2
3
5
5
0.005
3
4
7
7
0.010
hs026
127
630
255
757
0.272
37
101
75
138
0.081
hs027(*)
–
–
–
–
–
–
–
–
–
–
hs028(*)
9
11
19
20
0.033
9
11
19
20
0.033
hs039
15
27
31
42
0.088
12
15
25
27
0.077
hs040(*)
5
5
11
10
0.045
5
5
11
10
0.045
hs046(*)
–
–
–
–
–
189
203
379
222
9.682
hs047
–
–
–
–
–
44
117
89
161
0.257
hs048(*)
9
18
19
27
0.043
9
18
19
27
0.045
hs049(*)
31
42
63
73
0.149
31
42
63
73
0.156
hs050(*)
15
26
31
41
0.094
15
26
31
41
0.099
hs051(*)
10
20
21
30
0.064
10
20
21
30
0.064
hs052(*)
11
29
23
40
0.073
12
31
25
43
0.082
hs077(*)
22
25
45
47
0.103
23
31
47
54
0.108
hs078(*)
6
7
13
13
0.041
6
7
13
13
0.040
hs079(*)
12
18
25
30
0.077
12
19
25
31
0.077
hs100lnp
372
390
914
647
44.907
270
254
541
281
21.221
hs111lnp
123
353
247
476
2.664
106
280
213
386
2.286
maratos(*)
4
4
9
8
3.134
3
3
7
6
2.453
mwright(*)
18
32
37
50
173.467
11
24
23
35
83.719
orthregb
11
11
23
22
0.820
13
15
27
28
0.985
For all problems solved successfully, all algorithms achieved the same optimal function value as best-known objectives taken from the above web page, which are listed in Table 2 (within the termination tolerance). When comparing the six algorithms, we see that Algorithm 1, which successfully solves all problems, is superior to the other algorithms. The numerical results can indicate that the proposed algorithms are feasible and effective.
In the following, we compare Algorithm 1 and Algorithm INS in terms of the number of iterations. Algorithm INS (inexact Newton with smart tests) is proposed in Byrd et al.9 and also used to solve nonconvex equality constrained optimization. The aim of these algorithms is to solve nonconvex optimization. Therefore, in order to compare the effectiveness of the two algorithms, we only choose some of large-scale nonconvex equality constrained optimization test functions. The data are listed in Table 6. iter-A1 and iter-INS stand for the number of iterations of Algorithm 1 and that of Algorithm INS, respectively. By comparison, we argue that Algorithm 1 is a little superior to Algorithm INS.
Iteration statistics for Algorithm 1 and Algorithm INS.
Name
n
m
iter-A1
iter-INS
Name
n
m
iter-A1
iter-INS
catena
32
11
23
48
dtoc1nd
735
490
24
49
eigena2
110
55
26
148
eigenaco
110
55
2
28
eigenb2
110
55
3
21
eigenbco
110
55
2
218
eigenc2
462
231
3
130
eigencco
30
15
18
172
gilbert
1000
1
2
22
hs026
3
1
203
203
hs039
4
2
14
17
orthregb
27
6
12
21
Conclusions
We have presented and analyzed a family of inexact secant methods for nonconvex equality constrained optimization. By associating these algorithms with Armijo line search technique and adopting the nonsmooth exact penalty function as the merit function, we have proved that the proposed methods are globally convergent under some standard assumptions. The numerical results on a large set of test problems show that the proposed methods exhibit good practical performance in terms of efficiency and robustness. We believe that these inexact secant methods will continue to find application in more diverse areas.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported partially by the National Natural Science Foundation of China (Grant 11571074) and the Natural Science Foundation of Hunan Province (Grant 2016JJ2038).
References
1.
ColemanTFConnAR.On the local convergence of a quasi-Newton method for the nonlinear programming problem. SIAM J Numer Anal1984;
21: 755–769.
2.
NocedalJOvertonML.Projected Hessian updating algorithms for nonlinearly constrained optimization. SIAM J Numer Anal1985;
22: 821–850.
3.
FontecillaR.Local convergence of secant methods for nonlinear constrained optimization. SIAM J Numer Anal1988;
25: 692–712.
4.
ByrdRHSchnabelRB.Continuity of the null space basis and constrained optimization. Math Program1986;
35: 32–41.
5.
ByrdRH.On the convergence of constrained optimization methods with accurate Hessian information on a subspace. SIAM J Numer Anal1990;
27: 141–153.
DemboRSEisenstatSCSteihaugT.Inexact Newton methods. SIAM J Numer Anal1982;
19: 400–408.
8.
DemboRSSteihaugT.Truncated-Newton algorithms for large-scale unconstrained optimization. Math Program1983;
26: 190–212.
9.
ByrdRHCurtisFENocedalJ.An inexact Newton method for nonconvex equality constrained optimization. Math Program2010;
122: 273–299.
10.
WangZJZhuDTNieCY.A filter line search algorithm based on an inexact Newton method for nonconvex equality constrained optimization. Acta Math Appl Sin Engl Ser2017;
33: 687–698.
11.
GuCZhuDT.An inexact secant algorithm for large scale nonlinear systems of equalities and inequalities. Appl Math Model2012;
36: 3612–3620.
12.
WangZJCaiLZhuDT.Line search filter inexact secant methods for nonlinear equality constrained optimization. Appl Math Comp2015;
263: 47–58.