Sage Journals: Discover world-class research

Abstract

Lanczos iterative methods for solving a large sparse linear systems typically face the latent breakdown which strikes every time these methods are deployed. A number of approaches to deal with this issue have been investigated. One of them is by switching between the solvers preemptively breakdown. However, the problem is not fully solved yet. Here, we propose switching models combined with particular Lanczos iterative methods. The first model is by using the last iterate as the switching point with an unlimited number of iterations, the second model is by using the iterate with the minimum residual norm as the initial point and the third model is by using the iterate with the minimum of minimum residual norms as the switching point. These three models lead algorithms of SLULast, SLUMinRes, and SLUMoM, respectively. The parallel version of the proposed algorithms is also provided to speed up their convergence. In this case, we constructed the parallel of SLUMoM and we call it pSLUMoM. The numerical results showed that our switching models performed better than the existing switching strategy in terms of robustness and efficiency. In fact, under a parallel framework, pSLUMoM showed a performance gain of up to 50% in our experiments.

Keywords

Lanczos iterative methods sparse linear system breakdown switching models parallel computing

Introduction

Today most iterative methods for solving a large sparse linear system are based on Krylov subspace methods. Consider a nonsingular system

A x = b

(1)

where

A \in R^{n \times n}

and vectors

x

and

b \in R^{n}

. Krylov subspace methods are polynomial iterative methods that solve (1) by finding a sequence of iterates

x_{k}

which satisfy,^1,2

r_{k} = b - A x_{k} ⊥ L_{k}

(2)

where

L_{k} = K_{k} (A^{T}, y)

, for any non-zero vector

y

.³ This leads to the so-called Lanczos method, which is one of the popular Krylov solvers considered in this study.⁴

The improvement of the original Lanczos has been made for decades. One interesting was proposed by Brezinski,^5,3 by implementing the theory of formal orthogonal polynomial (FOP) as follows. Define the residual vector $r_{k}$ as

r_{k} = P_{k} (A) r_{0}

(3)

where

P_{k} (A) = 1 + α_{1} A + \dots + α_{k} A^{k}

be the FOPs of degree

k

at most. It also satisfies the normality condition

P_{k} (0) = 1

. Let

c

be a linear function and define it as

c (t^{i}) = c_{i} = ⟨ y, A^{i} r_{0} ⟩, for i = 0, 1, \dots, n - 1

(4)

According to condition (2), if we employed

c

into (3), we would obtain

c (A^{i} P_{k} (A)) = 0, for i = 0, 1, \dots, k - 1

(5)

In a short, Lanczos-type algorithms are derived from computing orthogonal polynomials

P_{k}

r_{k} = P_{k} (A) r_{0}

, and

x_{k} = b - A r_{k}

recursively. As a result, several Lanczos-types formula are derived based on equation (5).⁶

Although theoretically, the Lanczos methods terminates after some $k \leq n$ iterations such that $r_{k} = 0$ and $x_{k} = x$ , however, often the large number of iterations only find a good approximate solution. This is due to the accumulation of computational errors which cause some $P_{k}$ ’s to lose their orthogonality. The worst situation is when the algorithm suddenly stops without any solution. This situation is called breakdown.³

Breakdown in the Lanczos methods, or other non-stationary iterative methods, is a disorder that manifests itself whenever such methods are deployed. It mainly occurs when the orthogonal polynomials $P_{k}$ don’t exist due to the division by zero in the computational of their coefficients. To illustrate this situation, consider a Lanczos formula built by the following orthogonal polynomial

P_{k + 1} (t) = (α t + β) P_{k} (t) + γ P_{k - 1} (t), k = 0, 1, \dots

(6)

When formula (5) is applied into (6), we would get

c (t^{i} P_{k + 1}) = α c (t^{i + 1} P_{k} (t)) + β c (t^{i} P_{k} (t)) + γ c (t^{i} P_{k - 1} (t))

(7)

for

i = 0, 1, \dots, k

. This leads to a system of linear equations with the solutions are as follows

\begin{aligned} α & = 1 \\ γ & = \frac{- c (t^{k} P_{k} (t))}{c (t^{k - 1} P_{k - 1} (t))} \\ β & = \frac{γ c (t^{k} P_{k - 1} (t)) - c t^{k + 1} P_{k} (t)}{c (t^{k} P_{k} (t))} \end{aligned}

(8)

If the coefficient

c (t^{k - 1} P_{k - 1} (t)) = 0

, the Lanczos formula would be definitely infinite. This is called the true breakdown.³

A number of approaches have been proposed to handle the breakdown of Lanczos-types. The famous one, proposed in the late Nineteen Nineties, by jumping over the non-existing orthogonal was called the look-ahead method or the recursive zoom (MRZ).³ Other techniques along with restarting the equal set of algorithms and switching among one kind of Lanczos-type algorithms had been taken into consideration by Farooq and Salhi⁷ and Maharani and Salhi.⁸ These approaches had been proven to carry out higher than MRZ in the context of robustness. Furthermore, the works which attempted to speed up the convergence of Lanczos-types by embedding the interpolation and extrapolation model had been recommended by Maharani et al.,^9,10 including also the parallel version and their implementation on cloud computing.¹¹ Recently, Lanczos-types are hybridized with the machine learning technology: as can be seen by Thalib et al.,¹² the authors have explored the support vector machine (SVM) combined with Lanczos algorithms to solve the nonlinear problems, hence implemented it into the engineering problems.¹³ Finally, the enhanced of Lanczos-types were summarized by Maharani et al.¹⁴

In this article, we focus on switching between Lanczos-type algorithms by considering quality point to switch with. We also avoid fixing the number of iterations but monitoring the solution process for signs of breakdown. Note that the existing switching strategy,⁷ uses a fixed number of iterations to switch. This, obviously, has the disadvantages explained below.

The switching strategy allows us to switch between one algorithm to another algorithm before the first one breaks down. By switching, the polynomial basis of the Krylov subspace is changed. Therefore, the breakdown can be avoided. Since we do not know when a breakdown is likely to occur, fixing the number of iterations is arbitrary and may be counter-productive. Indeed, if it is too low, then we switch well before breakdown which means that the solution process is halted unnecessarily early losing potentially opportunity for more progress in the solution process. If, on the other hand, it is too large, then breakdown would occur well before the number of iterations has been reached which means that we now have to reset the process from scratch and restart it which is time consuming and may require manual intervention. A preemptive switch based on monitoring the progress of the solution process is desirable since one only switch near the point of breakdown. In some situations, it may well be that there is no need for switching at all. Strategies with no fixed number of iterations to switch are, therefore, attractive. Note that it is possible to monitor the progress of the solution process by checking the quality of the iterates and the denominators of some components of the iterates. When these become too small, orthogonality is lost and the quality of the iterates deteriorates from the point of view of the residual norm, just before the breakdown.⁸ Therefore, switching to another Lanczos-type algorithm once this has been observed is recommended. This is what is implemented in this article.

Note, before we reach a breakdown, some good points have been already generated. The goodness or quality of points is measured by their residual norm. The one with the lowest residual norm can be used as a switching point. Switching from this point is also a good one since we use the best iterate amongst the iterates generated by the Lanczos algorithm. This particular rule of switching is adapted from the restarting strategy discussed by Maharani and Salhi.⁸ Furthermore, another model of switching is also proposed in this article, namely switching based on the best collection of iterates with the lowest residual norms. This model is promising since it can reduce the computational time, moreover, it is suitable for the parallel version.

This article is organized as follows. The “Introduction” section is the introduction, the “Switching between Lanczos-Type algorithms: Traditional switching strategy” section consists of materials and methods that explains about review of the derivation of Lanczos-type algorithms via the Krylov subspace method, the breakdown phenomenon in Lanczos-type algorithms, and the switching models. The “Proposed switching models” section presents both the theoretical and numerical results of our proposed switching models. Finally, the discussion and conclusion are put in the “Proposed switching models” and “Results and discussion” sections, respectively.

Switching between Lanczos-type algorithms: Traditional switching strategy

The switching approach was introduced for the first time by Farooq and Salhi,⁷ to combat breakdown in Lanczos-type algorithm. By using the Brezinski’s scheme, Lanczos Orthores has the following formula

P_{k} (A) = C_{k} P_{k - 2} (A) + (D_{k} A + E_{k}) P_{k - 1} (A)

(9)

where

C_{k}

D_{k}

, and

E_{k}

are coefficients of polynomials computed by using the theory of FOPs.⁶ By applying relation (3), the sequence of

x_{k}

is computed as the following formula

\begin{aligned} x_{k} & = E_{k} x_{k - 1} + C_{k} x_{k - 2} - D_{k} r_{k - 1} \\ r_{k} & = b - A x_{k} \end{aligned}

(10)

Breakdown can occur when the polynomial (9) doesn’t exist, and it is caused by the infinite value of some coefficients

C_{k}

D_{k}

, or

E_{k}

in relation (10). Let’s proceed (10) for solving (1). By monitoring preemptively the breakdown, let’s say after

k

iteration, we stop and collect all

k

iterates as well as their corresponding residual norms

S = {x_{1}, x_{2}, \dots, x_{m}, \dots, x_{k}}

(11)

V = {‖ r_{1} ‖, ‖ r_{2} ‖, \dots, ‖ r_{m} ‖, \dots, ‖ r_{k} ‖}

(12)

x_{m}

is the iterate with the lowest residual norm

‖ r_{m} ‖

, for

m \leq k

. Moreover, we take one iterate amongst the iterates in the set

S

and use it as the initial guess for the next cycle. The classical switching method takes

x_{k}

as the initial guess. Yet, it uses the fixed number of iterations.

Proposed switching models

Switching model based on last iterate with unfixed number of iterations

We improved the fixed number of iterations of the switching method by considering the unfixed one, as illustrated as follows. Consider other Lanczos types, say Orthodir, for thus we switch after the Orthores algorithm breaks down. The Orthodir algorithm is formed from the combination of two Lanczos formulas, called A $_{8}$ and B $_{6}$ , where the orthogonal polynomials which generate both formula are $P_{k} (A) = C_{k} A {P_{k - 1}}^{(1)} + P_{k - 1}$ and $P_{k - 1} (A) = B_{k} {P_{k - 2}}^{(1)} + (A + D_{k}) {P_{k - 1}}^{(1)}$ , respectively, where $B_{k}$ , $C_{k}$ , and $D_{k}$ are the coefficients of the polynomials computed by applying the theory of FOPs. This algorithm generates the sequence of approximate solutions by the formula

x_{k} = x_{k - 1} - z_{k - 1}

(13)

where

z_{k} = {P_{k} (A)}^{(1)} z_{0}

and

z_{0} = r_{0}

. Let’s take the two vectors below as the initial guesses of the Orthodir algorithm

x_{0} = x_{m}

(14)

y = b - A x_{0}

(15)

At the first iterate of Orthodir process,

k = 1

, we have

x_{1} = x_{0} - z_{0}

. As in (14), we have

x_{1} = x_{m} - r_{m}

. As

‖ r_{m} ‖

is the minimum residual norm of the Orthores process, we assume that it is small enough, thus it is close to zero vector. As a result,

x_{1}

is closed enough to

x_{m}

, which indicates that we approach a good approximate solution. If we continue this process, we would end up with another sequence of approximate solutions generated by the Orthodir algorithm, and we would get another

x_{m}

in the sequence. If we repeat these steps, taking other Lanczos types to switch with, we would end up with a sequence containing some good approximate solutions.

Consider Lanczos algorithms A $_{12}$ , A $_{8} /$ B $_{6}$ , A $_{8} /$ B $_{8}$ , and A $_{4}$ . Pick one of them, say A $_{12}$ , and run it until breaks down. Let’s assume it breaks at $k$ iterations, then we pick $x_{k - 1}$ . Choose another algorithm, say algorithm A $_{8} /$ B $_{6}$ , and initialize it as follows:

x_{0} = x_{k - 1}

(16)

y = b - A x_{0}

(17)

Run the A

_{8} /

_{6}

algorithm until it breaks, let’s say it stops at the

s

iteration. Then we pick

x_{s - 1}

and use it as the initialization of the A

_{8} /

_{8}

algorithm as follows:

x_{0} = x_{s - 1}

(18)

y = b - A x_{0}

(19)

We repeat these steps until the residual norm of the iterate of the Lanczos algorithms, either A

_{12}

, or A

_{8} /

_{6}

or A

_{8} /

_{8}

or A

_{4}

, is less than a certain tolerant.

All procedures above are put in a frame algorithm called, SLULast, as displayed in Algorithm 1, namely switching Lanczos-type algorithms for the unlimited number of iterations from the last iterate immediately prior to breakdown.

Algorithm 1.

The SLULast algorithm

x_{0} = 0

y = b - A x_{0}

t e m p = 0

3: for

i = 1, 2, \dots

t e m p = t e m p + 1

5: if

1 \equiv t e m p

(mod 4) then

6: Run A

_{12}

algorithm until stops (either

‖ r_{k} ‖ = I n f

‖ r_{k} ‖ = N a N

)

7: Collect

{s o l}_{l a s t} = x_{k - 1}

8: end if

9: if

2 \equiv t e m p

(mod 4) then

10: Run A

_{8} /

_{6}

algorithm until stops (either

‖ r_{m} ‖ = I n f

‖ r_{m} ‖ = N a N

)

11: Collect

{s o l}_{l a s t} = x_{m - 1}

12: end if

13: if

3 \equiv t e m p

(mod 4) then

14: Run A

_{8} /

_{8}

algorithm until stops (either

‖ r_{l} ‖ = I n f

‖ r_{l} ‖ = N a N

15: Pick

{s o l}_{l a s t} = x_{l - 1}

16: else

17: Run A

_{4}

algorithm until stops (either

‖ r_{s} ‖ = I n f

‖ r_{s} ‖ = N a N

18: Pick

{s o l}_{l a s t} = x_{s - 1}

19: end if

20: Initialization :

x_{0} = {s o l}_{l a s t}

(20)

y = b - A x_{0}

(21)

21: end for

22: if

‖ r_{s} ‖ < ϵ

then

23:

x = {s o l}_{l a s t}

24: STOP.

25: end if

Switching model based on iterate with the minimum residual norm for unfixed number of iterations

We propose algorithm SLUMinRes (switching Lanczos types using the iterate with the minimum residual norm for the unfixed number of iterations), presented in Algorithm 3. As explained in the previous section, we switch between Lanczos types, particularly A $_{12}$ , A $_{8} /$ B $_{6}$ (Orthodir), A $_{8} /$ B $_{8}$ , and A $_{4}$ (Orthores) algorithms, and for every time we switch, we use relations (14) and (15) at the initialization step. In the implementation, we designed algorithm MinRes (2) to compute the iterate with the minimum residual norm, then we call this function at the algorithm SLUMinRes.

Algorithm 2.

The MinRes algorithm

1: Collect all iterates as in (eq6).

2: Collect all responding residual norms of the iterates as in (eq7).

3: Compute the minimum residual norm

{n o r m}_{m i n} = m i n (V) .

(22)

4: Obtain the approximate solution as well as the residual norm as follows

{s o l}_{m i n} = S (m), {n o r m}_{m i n} = ‖ r_{m} ‖ .

5: STOP.

Algorithm 3.

The SLUMinRes algorithm

x_{0} = 0

y = b - A x_{0}

t e m p = 0

3: for

a = 1, 2, \dots

t e m p = t e m p + 1

5: if

1 \equiv t e m p

(mod 4) then

6: Run A₁₂ algorithm until stops.

7: Run the MinRes algorithm.

8: Obtain sol_min

9: end if

10: if

2 \equiv t e m p

(mod 4) then

11: Run A₈/B₆ algorithm until stops with initialization :

x_{0} = {s o l}_{m i n}

(23)

y = b - A x_{0} .

(24)

12: Run the MinRes algorithm.

13: Obtain sol_min.

14: end if

15: if

3 \equiv t e m p

(mod 4) then

16: Run A₈/B₈ algorithm until stops with initialization as in (27) and (24).

17: Run the MinRes algorithm.

18: Obtain sol_min

19: else

20: Run A₄ algorithm until stops with initialization as in (27) and (24).

21: Run the MinRes algorithm.

22: Obtain sol_min

23: end if

24: if

‖ r_{m i n} ‖ < ϵ

then

25:

x = {s o l}_{m i n}

26: STOP.

27: end if

28: end for

Switching model from the minimum of the minimum residual norms

This model adopts the SLUMinRes algorithm, where it uses the iterate with the minimum of the minimum residual norms of some Lanczos types. We consider some Lanczos types, A $_{12}$ , A $_{8} /$ B $_{6}$ , A $_{8} /$ B $_{8}$ , and A $_{4}$ . First of all, we run all of them together until they stop due to the breakdown. Let’s say, they stop at the ${k_{1}}^{t h}$ , ${k_{2}}^{t h}$ , ${k_{3}}^{t h}$ , and ${k_{4}}^{t h}$ iterations, respectively. We then collect all iterates of each algorithm as well as their corresponding residual norms. So now we have sets $S_{1}$ , $S_{2}$ , $S_{3}$ , and $S_{4}$ , which are the collection of all iterates generated by A $_{12}$ , A $_{8} /$ B $_{6}$ , A $_{8} /$ B $_{8}$ , and A $_{4}$ algorithms, respectively, and we have $V_{1}$ , $V_{2}$ , $V_{3}$ , and $V_{4}$ , which are the collection of all corresponding residual norms of those iterates. Next step, by using formula (22), we would get the minimum residual norms of each set $V_{i}$ , and finally, we calculate the minimum value of set $L$ and assign it as ${r_{m}}^{(m i n)}$ . The iterate which corresponds to this residual norm is used as the switching point. For more detail, it can be seen in Algorithm 4.

Algorithm 4.

The SLUMoM algorithm

x_{0} = 0

y = b - A x_{0}

2: Run MinRes A₁₂, MinRes A₈/B₆, MinRes A₈/B₈, and MinRes A

_{4}

algorithms, and put index

i = 1, 2, 3, 4

for each of them.

3: Collect all of the minimum residual norms of all algorithms on Step 2

L = {‖ {r_{m}}^{(1)} ‖, ‖ {r_{m}}^{(2)} ‖, ‖ {r_{m}}^{(3)} ‖, ‖ {r_{m}}^{(4)} ‖}

(25 ).

4: Compute the minimum of the collection of the minimum residual norms in (25)

‖ {r_{m}}^{(m i n)} ‖ = min {L}

(26 ).

5: Obtain the approximate solution,

{x_{m}}^{(m i n)}

which corresponds to the residual norm of (26).

6: if

‖ {r_{m}}^{(m i n)} ‖ < t o l e r a n t

then

7: Stop

8: else

9: Initialization :

x_{0} = {x_{m}}^{(m i n)},

(27 ).

y = b - A x_{0} .

(28 ).

10: Repeat STEP 2 until

‖ {r_{m}}^{(m i n)} ‖ < t o l e r a n t

11: Obtain

{x_{m}}^{(m i n)}

12: end if

13: STOP.

Results and discussion

The theoretical results

Our proposed models, SLULast, SLUMinRes, and SLUMoM, not only yield the good approximate solutions, but also can cut the iterations from the traditional switching. However, we need to justify theoretically that our models consistently meet the lowest residual norms. We adopt^8,25 to prove that switching always converges and thus yields an iterate with the residual norm which is smaller than a certain tolerance.

Suppose we solve an SLE using SLUMinRes. Denote $s > 0$ the number of cycles, where a cycle means one time running of SLUMinRes algorithm. For $s = 1$ , consider SLUMinRes A $_{12}$ running to find the solution. We assume that the A $_{12}$ algorithm breaks down at the $k + 1$ iteration. Then, ${x_{k}}^{(s)}$ , is obtained. Obviously the corresponding residual norm is greater than $ϵ_{1}$ , that is,

‖ {r_{k}}^{(s)} ‖ > ϵ_{1}

(29)

for some

ϵ_{1} > 0

. It means that we continue to second cycle (

s = 2

), that is, we run SLUMinRes A

_{8} /

_{6}

. Again, we assume A

_{8} /

_{6}

stops at the

m + 1

iteration, then we get

{x_{m}}^{(s)}

. If we repeat these steps, then we will get a finite sequence of iterates

{{x_{k}}^{(1)}, {x_{m}}^{(2)}, \dots, {x_{l}}^{(s)}}

According to the definition of the Krylov subspace given by Saad¹, iterate ${x_{m}}^{(2)}$ of the second cycle of SLUMinRes satisfies the following two conditions:

{x_{m}}^{(2)} - x_{0} \in K_{k} (A, r_{0})

(30)

{r_{m}}^{(2)} = b - A {x_{m}}^{(2)} ⊥ K_{k} (A^{T}, y)

(31)

where

r_{0} = {r_{k}}^{(1)}

. Substituting (27) into (31) yields:

{x_{m}}^{(2)} - {x_{k}}^{(1)} \in K_{k} (A, {r_{k}}^{(1)})

(32)

Following the results in (3), we have:

\begin{aligned} {r_{m}}^{(2)} & = {r_{k}}^{(1)} + α_{1} A {r_{k}}^{(1)} + α_{2} A^{2} {r_{k}}^{(1)} + \dots + α_{k} A^{k} {r_{k}}^{(1)} \\ = P_{k} (A) {r_{k}}^{(1)} \end{aligned}

(33)

where

P_{k} (A) = 1 + α_{1} A + \dots + α_{k} A^{k} .

Calculating the norm of

{r_{m}}^{(2)}

in (33) yields

\begin{aligned} ‖ {r_{m}}^{(2)} ‖ & = ‖ P_{k} (A) {r_{k}}^{(1)} ‖ \\ \leq ‖ P_{k} (A) ‖ ‖ {r_{k}}^{(1)} ‖ \end{aligned}

(34)

If this residual norm is still bigger than the tolerance

ϵ_{1}

, we switch again, in which case we will have the third cycle giving

{r_{l}}^{(3)} = P_{k} (A) {r_{l}}^{(2)}

(35)

Calculating the norm of

{r_{l}}^{(3)}

\begin{aligned} ‖ {r_{l}}^{(3)} ‖ & = ‖ P_{k} (A) {r_{m}}^{(2)} ‖ \\ \leq ‖ P_{k} (A) ‖ ‖ {r_{m}}^{(2)} ‖ \\ \leq {‖ P_{k} (A) ‖}^{2} ‖ {r_{k}}^{(1)} ‖ from (???) \end{aligned}

(36)

If this procedure is continued, then for the

s

th cycle, we have:

\begin{aligned} ‖ {r_{j}}^{(s)} ‖ & = ‖ P_{k} (A) {r_{j}}^{(s - 1)} ‖ \\ \leq {‖ P_{k} (A) ‖}^{s} ‖ {r_{k}}^{(1)} ‖ \end{aligned}

(37)

for

j = 1, 2, \dots

, is the number of Lanczos types used on the SLULast algorithm. From (34), (36), and (37), and since

‖ P_{k} (A) ‖ > 0

, we conclude that

‖ {r_{j}}^{(s)} ‖ \leq ‖ {r_{j}}^{(s - 1)} ‖ \leq \dots ‖ {r_{j}}^{(1)} ‖

(38)

where

s \geq 1

is the number of cycles.

We put the theoretical view above in the following theorem.

Theorem

Suppose we solve an SLE using SLULast or SLMinRes algorithm. Below are the characteristics of the iterates generated by the algorithm.

The sequence of iterates generated by SLULast or SLUMinRes algorithm, ${{x_{k}}^{(1)}, {x_{m}}^{(2)}, \dots, {x_{j}}^{(s)}}$ , is finite, where $j$ is a number of cycles, and $k, m, \dots, j$ denotes the number of iterations used before the Lanczos algorithm breaks down.

The associated residual norms of the iterates generated by SLULast or SLUMinRes algorithm, decreases as the number of cycles increased, that is, for any $ϵ > 0$

‖ {r_{j}}^{(s)} ‖ \leq \dots \leq ‖ {r_{m}}^{(2)} ‖ \leq ‖ {r_{k}}^{(1)} ‖ \leq ϵ

In other words, there is an improvement of the SLUMinRes algorithm in each cycle.

Numerical results

We implemented all the switching models including SLULast, SLUMinRes, and SLUMoM algorithms on several SLEs, ranging from dimensions $1000$ to $20, 000$ . These are carried out in Matlab 2017b under Windows 8 with processor i7 and RAM 16 GB. The test problems were used to compare our algorithm against an existing switching algorithm.

The model problem that was used for testing our algorithms is based on the Poisson equation on the open unit square $Ω = (0, 1) x (0, 1)$ , that is, $- \frac{\partial^{2}}{\partial x^{2}} - \frac{\partial^{2}}{\partial y^{2}} = f$ . Thus by using finite difference method (FDM)²⁶, the model is transformed into the systems of linear equations $A x = b$ , with matrix $A$ being of the form

A = (\begin{matrix} B & - I & \dots & \dots & 0 \\ - I & B & - I \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & - I & B & - I \\ 0 & \dots & \dots & - I & B \end{matrix})

(39)

where

B = (\begin{matrix} 4 & α & \dots & \dots & 0 \\ β & 4 & α & \dots \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & β & 4 & α \\ 0 & \dots & \dots & β & 4 \end{matrix})

(40)

and with

α = - 1 + δ

β = - 1 - δ

, and

δ = 0.2

. All of the results are presented in the tables and graphs.

Comparison of traditional switching and the proposed models

In this testing problem, three switching models are compared for solving system (39). To test the validation of the models, we use two exact solutions, $x = [11 \dots 1]$ and $x$ consists of random values between 0 and 1, such that $b = A x$ . The results are presented in Tables 1 and 2, respectively. The figures of the residual norms against the number of iterations for SLUMinRes and SLUMoM are presented in Figures 1 and 2, respectively.

Figure 1.

Performance of SLUMinRes and SLUMoM on SLEs of dimensions 1000–5000 for $x = [11 \dots 1]$ .

Figure 2.

Performance of SLUMinRes and SLUMoM on SLEs of dimensions 1000–10,000 for $x = [11 \dots 1]$ .

Table 1.

The comparison of GMRES, traditional switching and switching models SLULast, SLUMinRes, and SLUMoM, with $x = [11 \dots 1]$ . Note, a cycle means one time running of the algorithm for the unfixed number of iterations, which means every cycle has different number of iterations.

Dim	GMRES		Traditional switching			SLULast			SLUMinRes			SLUMoM
$n$	$‖ r_{g m r e s} ‖$	$T (s)$	$‖ r_{t r a d} ‖$	$T (s)$	Cycles*	$‖ r_{s l u l a s t} ‖$	$T (s)$	Cycles*	$‖ r_{s l u m i n r e s} ‖$	$T (s)$	Cycles*	$‖ r_{s l u m o m} ‖$	$T (s)$	Cycles*
$1000$	$1.2000 \times 10^{- 10}$	$14.56$	$7.6478 \times 10^{- 14}$	$0.3011$	25	$9.8384 \times 10^{- 14}$	$1.48$	$53$	$9.070 \times 10^{- 14}$	$0.18$	$6$	$8.4954 \times 10^{- 14}$	0.49	7
$2000$	$3.3561 \times 10^{- 10}$	$17.31$	$N a N$	NA	NA	$6.1431 \times 10^{- 14}$	$0.81$	$21$	$9.2541 \times 10^{- 14}$	0.23	5	$9.8546 \times 10^{- 14}$	$0.81$	$6$
$3000$	$6.0192 \times 10^{- 10}$	$19.22$	$9.7652 \times 10^{- 14}$	$1.2911$	$49$	$9.5873 \times 10^{- 14}$	$0.94$	$13$	$8.6372 \times 10^{- 14}$	$0.46$	$7$	$8.9205 \times 10^{- 14}$	$1.08$	$5$
$4000$	$5.4868 \times 10^{- 10}$	$22.82$	$9.3883 \times 10^{- 14}$	$2.4693$	$21$	$9.4484 \times 10^{- 14}$	$1.94$	$21$	$9.4739 \times 10^{- 14}$	$0.65$	$7$	$8.7248 \times 10^{- 14}$	1.91	6
$5000$	$1.6762 \times 10^{- 09}$	$27.48$	$N a N$	$N A$	$N A$	$5.0454 \times 10^{- 14}$	$2.15$	$17$	$9.9558 \times 10^{- 14}$	$0.80$	$7$	$7.8395 \times 10^{- 14}$	$3.11$	$6$
$6000$	$8.7208 \times 10^{- 10}$	$30.35$	$3.8866 \times 10^{- 13}$	$6.3285$	25	$9.1492 \times 10^{- 14}$	$2.11$	13	$9.7370 \times 10^{- 14}$	$1.09$	7	$7.2438 \times 10^{- 14}$	3.99	6
$7000$	$2.4435 \times 10^{- 09}$	$30.32$	$8.5202 \times 10^{- 13}$	$8.3178$	45	$7.5309 \times 10^{- 14}$	$2.14$	$9$	$9.9183 \times 10^{- 14}$	$1.32$	$7$	$9.6046 \times 10^{- 14}$	4.83	6
$8000$	$4.1192 \times 10^{- 09}$	$32.75$	$8.8412 \times 10^{- 13}$	$11.0737$	$13$	$9.3624 \times 10^{- 14}$	$5.78$	$21$	$9.0890 \times 10^{- 14}$	$1.36$	$7$	$7.5590 \times 10^{- 14}$	6.23	6
$9000$	$9.1040 \times 10^{- 9}$	$35.06$	$N a N$	$21.6507$	$91$	$8.6605 \times 10^{- 14}$	$6.83$	$21$	$8.5282 \times 10^{- 14}$	$1.69$	$7$	$1.5134 \times 10^{- 14}$	7.41	6
$10, 000$	$7.9731 \times 10^{- 9}$	$37.44$	$N a N$	$32.3252$	$91$	$8.8317 \times 10^{- 14}$	$7.01$	$21$	$9.1992 \times 10^{- 14}$	$1.98$	$7$	$8.1808 \times 10^{- 14}$	8.77	6
$20, 000$	$1.6169 \times 10^{- 8}$	$57.22$	$N a N$	$123.1609$	$247$	$9.1437 \times 10^{- 14}$	$32.29$	$29$	$7.8245 \times 10^{- 14}$	$6.67$	$7$	$9.2115 \times 10^{- 14}$	31.73	7

Table 2.

The comparison of switching SLULast, and switching models SLUMinRes and SLUMoM, with $x = r a n d [0 : 1]$ . Note, a cycle means one time running of the algorithm for unfixed number of iterations, which means every cycle has different number of iterations.

Dim	SLULast			SLUMinRes			SLUMoM
$n$	$‖ r_{s l u l a s t} ‖$	$T (s)$	Cycles*	$‖ r_{s l u m i n r e s} ‖$	$T (s)$	Cycles*	$‖ r_{s l u m o m} ‖$	$T (s)$	Cycles*
$1000$	$7.9203 \times 10^{- 14}$	$0.60$	$21$	$9.1 \times 10^{- 14}$	$0.18$	$6$	$9.9099 \times 10^{- 14}$	0.49	5
$2000$	$7.8411 \times 10^{- 14}$	$0.72$	$17$	$8.4775 \times 10^{- 14}$	0.52	6	$8.6858 \times 10^{- 14}$	1.00	$6$
$3000$	$9.0927 \times 10^{- 14}$	$1.22$	8	$8.4775 \times 10^{- 14}$	$0.52$	$6$	$8.2867 \times 10^{- 14}$	1.59	$5$
$4000$	$5.2788 \times 10^{- 14}$	$1.89$	$17$	$7.8220 \times 10^{- 14}$	$0.79$	$7$	$7.4683 \times 10^{- 14}$	2.59	5
$5000$	$9.4197 \times 10^{- 14}$	$5.38$	$33$	$8.5509 \times 10^{- 14}$	$1.15$	$7$	$6.8163 \times 10^{- 14}$	3.73	$5$
$6000$	$7.9125 \times 10^{- 14}$	$2.57$	17	$9.9132 \times 10^{- 14}$	$1.40$	7	$9.8407 \times 10^{- 14}$	4.61	5
$7000$	$9.8105 \times 10^{- 14}$	$7.96$	$33$	$8.9288 \times 10^{- 14}$	$1.82$	$7$	$6.8539 \times 10^{- 14}$	6.55	6
$8000$	$6.3247 \times 10^{- 14}$	$7.05$	$25$	$8.5822 \times 10^{- 14}$	$2.02$	$7$	$8.2632 \times 10^{- 14}$	8.26	6
$9000$	$8.0582 \times 10^{- 14}$	$15.50$	$49$	$9.3121 \times 10^{- 14}$	$2.47$	$7$	$4.9153 \times 10^{- 14}$	10.28	6
$10, 000$	$9.3455 \times 10^{- 14}$	$6.73$	$21$	$8.7338 \times 10^{- 14}$	$2.84$	$7$	$9.4051 \times 10^{- 14}$	12.03	6
$20, 000$	$8.8768 \times 10^{- 14}$	$13.20$	$29$	$8.5692 \times 10^{- 14}$	$8.31$	$7$	$9.0975 \times 10^{- 14}$	35.70	6

Parallel of switching models

The switching models for finding the solution of large-scale SLEs, is still time-consuming, particularly on single processor machines. Therefore, the parallel environment is needed to cope with this situation. In general, one sequential program will be split into several sub-programs in a parallel machine and they are executed in several nodes. According to Amdahl’s Law, the parallel machine uses to finish the whole of the task is as fast time as its slowest core.¹⁵ The previous MPI and open MPI is a well-known parallel algorithm for solving 1D heat equation was described.¹⁶ For solving high dimensional problems of SLEs particularly, the distribution of subsets of the equation can be done by sending each into a number of computing machines or cores. One has been done was by using the accelerated projection-based consensus.¹⁷ Other methods, by utilizing the parallel iterative methods, for instance, such as by Sultanov et al.,¹⁸ Ma et al.,¹⁹ and Anzt et al.²⁰ The recent parallel works on GMRES running on NVIDIA GPGPU accelerator for solving systems of linear equations based on the sparse matrices, were discussed by Minin et al.²¹

In this study, we focus on parallel using the parallel computing toolbox (PCT) provided by MatLab. Particularly, we use parfor-loop which is a built-function available at PCT. The parfor-loop is useful in situations where many loop iterations of a simple calculation are required. It divides the loop iterations into groups so that each worker executes some portion of the total number of iterations. The parfor-loop is also useful when the loop iterations take a long time to execute because the workers can execute iterations simultaneously. In principle, when using the parfor-loop, the data is sent from the client to workers, and the results are sent back to the client and pieced together. This is illustrated in Figure 3.

Figure 3.

The concept of parallel tasks on several CPUs.²²

In relation to our switching models, the most suitable model for using the parfor-loop principal is the SLUMoM algorithm. It is because there are some sub-programs (A $_{12}$ , A $_{8} /$ B $_{6}$ , A $_{8} /$ B $_{8}$ , and A $_{4}$ ) in the algorithm which can be run independently, before we find the best iterate used as a switching point. Therefore, in this section, we will look at the parallel version of the SLUMoM, and we call is as pSLUMoM. We illustrate the parallel program as shown in Figure 4.

Figure 4.

The parallel scheme of pSLUMoM algorithm.

Comparison of SLUMoM and pSLUMoM

Using the same testing problems of SLEs, we present the numerical results of the switching model SLUMoM and its parallel version, pSLUMoM. We calculated the speedup of the parallel scheme could reach against the sequential one when solving several SLEs. It is visualized clearly in Figures 5 and 6 for, respectively, the true solution $x = [11 \dots 1]$ and $x = r a n d [0 : 1]$ .

Figure 5.

Comparison time between switching model the minimum of minimum residual norms and its parallel version for solving $A x = b$ , where $x = [11 \dots 1]$ .

Figure 6.

Comparison time between switching model the minimum of minimum residual norms and its parallel version for solving $A x = b$ , where $x = r a n d [0 : 1]$ .

Discussion

Overall, the proposed switching models, SLULast, SLUMinRes, and SLUMoM, performed better compared with the traditional switching, in terms of accuracy, efficiency, and robustness. As we can see from Tables 1 and 2, although all switching models obtained the approximate solutions with the residual norms of $10^{- 14}$ consistently, however, SLUMinRes consumed less computational times compared with SLULast and SLUMoM. The SLUMoM algorithm, interestingly, used a less number of cycles compared with SLUMinRes. Its computational time is bigger than SLUMinRes because the four algorithms run at the same time to find the lowest residual norm amongst them. It would be interesting if our methods are applied in solving the nonlinear problems such by Noon²³ and Rasheed and Kadhim.²⁴ It, therefore, suggests to construct the parallel version, which we name it as pSLUMoM.

According to Tables 3 and 4, the pSLUMoM algorithm could reduce the computational time up to 50% from the sequential SLUMoM. This is a good start as we just used the parfor parallel provided in Matlab, with the limited number of workers, namely only 4. We might get better results when we used other parallel platforms and several workers.

Table 3.

The comparison of switching model SLUMoM and pSLUMoM with $x = [11 \dots 1]$ .

Dim	SLUMoM			pSLUMoM			Speed up
$n$	$‖ r_{s l u m o m} ‖$	$T (s)$	Cycles*	$‖ r_{p s l u m o m} ‖$	$T (s)$	Cycles*	$\times$ (times)
$1000$	$8.4954 \times 10^{- 14}$	0.49	7	$8.4954 \times 10^{- 14}$	0.31	7	1.6
$2000$	$9.8546 \times 10^{- 14}$	$0.81$	$6$	$9.8546 \times 10^{- 14}$	$0.51$	6	1.6
$3000$	$8.9205 \times 10^{- 14}$	$1.08$	$5$	$8.9205 \times 10^{- 14}$	$1.00$	5	1.1
$4000$	$8.7248 \times 10^{- 14}$	1.91	6	$8.7248 \times 10^{- 14}$	1.08	6	1.8
$5000$	$7.8395 \times 10^{- 14}$	$3.11$	$6$	$7.8395 \times 10^{- 14}$	$2.15$	6	1.4
$6000$	$7.2438 \times 10^{- 14}$	3.99	6	$7.2438 \times 10^{- 14}$	2.34	6	1.7
$7000$	$9.6046 \times 10^{- 14}$	4.83	6	$9.6046 \times 10^{- 14}$	3.18	6	1.5
$8000$	$7.5590 \times 10^{- 14}$	6.23	6	$7.5590 \times 10^{- 14}$	4.03	7	1.5
$9000$	$1.5134 \times 10^{- 14}$	7.41	6	$1.5134 \times 10^{- 14}$	5.22	6	1.4
$10, 000$	$8.1808 \times 10^{- 14}$	8.77	6	$8.1808 \times 10^{- 14}$	6.27	7	1.4

Table 4.

The comparison of switching model SLUMoM and pSLUMoM with $x = r a n d [0 : 1]$ .

Dim	SLUMoM			pSLUMoM			Speedup
$n$	$‖ r_{s l u m o m} ‖$	$T (s)$	Cycles*	$‖ r_{p s l u m o m} ‖$	$T (s)$	Cycles*	$\times$ (times)
$1000$	$9.9099 \times 10^{- 14}$	0.49	5	$9.9099 \times 10^{- 14}$	0.38	5	1.3
$2000$	$8.6858 \times 10^{- 14}$	$1.00$	$6$	$8.6858 \times 10^{- 14}$	$0.71$	$6$	1.4
$3000$	$8.2867 \times 10^{- 14}$	$1.59$	$5$	$8.2867 \times 10^{- 14}$	$0.94$	$5$	1.7
$4000$	$7.4683 \times 10^{- 14}$	2.59	5	$7.4683 \times 10^{- 14}$	1.44	5	1.8
$5000$	$6.8163 \times 10^{- 14}$	$3.73$	$5$	$6.8163 \times 10^{- 14}$	$1.66$	$5$	2.3
$6000$	$9.8407 \times 10^{- 14}$	4.61	5	$9.8407 \times 10^{- 14}$	2.37	5	1.9
$7000$	$6.8539 \times 10^{- 14}$	6.55	6	$6.8539 \times 10^{- 14}$	2.87	6	2.3
$8000$	$8.2632 \times 10^{- 14}$	8.26	6	$8.2632 \times 10^{- 14}$	4.55	6	1.8
$9000$	$4.9153 \times 10^{- 14}$	10.28	6	$4.9153 \times 10^{- 14}$	4.58	6	2.24
$10, 000$	$9.4051 \times 10^{- 14}$	12.03	6	$9.4051 \times 10^{- 14}$	5.18	6	2.3

Conclusion

The new fashions of switching approach take into account the iterate with the minimum residual norm and the iterate with the minimum of minimum residual norms preceding breakdown. They additionally do unlimiting the number of iterations before switching, which is different from the conventional one. These proposed procedures result in new algorithms, specifically SLULast, SLUMinRes, and SLUMoM.

The results showed that the performances of SLUMinRes and SLUMoM in solving several SLEs are better than SLULast and the conventional switching, in terms of efficiency and accuracy. It is understandable that the iterate with the lowest residual norm is the best point for switching. For the SLUMoM algorithm particularly, the parallel version was created using the parfor function to reduce the computational time.

There are new open questions in this study, unfortunately, the performance of SLUMinRes doesn’t seem to follow the theoretical results. It may be visible from the outcomes that on a few cycles of SLUMinRes, the residual norms are nonetheless fluctuating rather than decreasing. Our point of view in this case is that the orthogonal polynomials, $P_{k}$ , are not significantly orthogonal through the errors which at instances acquire and at others cancel out. The different issue is that pSLUMoM nonetheless desires greater time to converge. The authors suggest different parallel platforms such as CUDA, be taken into consideration to get faster convergence. Furthermore, the nonlinear problems related to the discretization process which results in the linear system are very welcome to investigate.

Footnotes

Acknowledgements

We would like to say thanks to Prof. Abdellah Salhi for his idea of modified switching. We would also thanks to Universiti Malaysia Terengganu for all facilities provided during this research.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project is supported financially by Research Intensified Grant Scheme (RIGS) Universiti Malaysia Terengganu Phase 1/2019, Number RIGS/1/2018/ICT04/UMT/02/1.

ORCID iD

Maharani A Bakar

References

Saad

. Krylov subspace method for solving large unsymmetric linear systems. Math Comput 1981; 37: 105–126.

Saad

. Iterative Methods for Sparse Linear Systems. Philadelphia: Society for Industrial and Applied Mathematics, 2003.

Brezinski

Zaglia

Sadok

. A review of formal orthogonality in Lanczos-based method. J Comput Appl Math 2002; 140: 81–98.

Lanczos

. Solution of systems of linear equations by minimized iterations. J Res Natl Bur Stand (1934) 1952; 49: 33–53.

Brezinski

Sadok

. Lanczos-type algorithms for solving systems of linear equations. Appl Numer Math 1993; 11: 443–473.

Baheux

. New implementations of Lanczos method. J Comput Appl Math 1995; 57: 3–15.

Farooq

Salhi

. A switching approach to avoid breakdown in Lanczos-type algorithms. Appl Math Inform Sci 2014; 8: 2161–2169.

Maharani

A Bakar

Salhi

. Restarting from specific points to cure breakdown in Lanczos-type algorithms. J Math Fund Sci 2015; 47: 167–184.

Maharani

A Bakar

Salhi

Suharto

. Introduction of interpolation and extrapolation models in Lanczos-type algorithms A13B6 and A13B10 to enhance their stability. J Math Fund Sci 2018; 50: 148–165.

10.

Maharani

A Bakar

Salhi

Khan

. Lanczos-type algorithms with embedded interpolation and extrapolation models for solving large scale systems of linear equations. Int J Comput Sci Math (IJCSM) 2019; 10: 429–442.

11.

Maharani

A Bakar

Salhi

Khan

et al. Solving large scale systems of linear equations with a stabilized Lanczos-type algorithms running on a cloud computing platform. Hacet J Math Stat 2018; 47: 1730–1741.

12.

Thalib

Maharani

A Bakar

Ibrahim

. Application of support vector regression in Krylov solvers. Ann Emerg Technol Comput (AETiC) 2021; 5(5): 178–186.

13.

Thalib

Maharani

A Bakar

. (2023). Simulation of heat flow in welding using hybrid restarting FDM-SVR-Lanczos. AIP Conf Proc. IPCOETI, 2021, 21–22 June; 2484(1): 030003.

14.

Maharani

A Bakar

Thalib

Juliansyah

et al. high performances of stabilized Lanczos-types for solving high dimension problems: a survey. Int J Math Comput Sci 2021; 2: 837–854.

15.

Geshi

. The art of high performance computing for computational science. Singapore: Springer, 2019.

16.

Trobec

Slivinik

Bulic

et al. Introduction to Parallel Computing. 2nd ed. Switzerland: Springer, 2018. 268 p.

17.

Azizan-Ruhi

Lahouti

Avestimehr

et al. Distributed solution of large-scale linear systems via accelerated projection-based consensus. Proc IEEE Int Conf Acoust Speech Signal Process 2018; 67(14): 6358–6362.

18.

Sultanov

Akimova

Misilov

et al. Parallel direct and iterative methods for solving the time-fractional diffusion equation on multicore processors. Mathematics 2021; 10: 323.

19.

Liu

. Developing a multi-GPU-enabled preconditioned GMRES with inexact triangular solves for block sparse matrices. Math Probl Eng 2021; 2021: 1–17.

20.

Anzt

Boman

Falgout

et al. Preparing sparse solvers for exascale computing. Philos A Math Phys Eng Sci 2020; 378: 20190053.

21.

Minin

Matveev

Fedorov

et al. Benchmarks of cuda-based GMRES solver for Toeplitz and Hankel matrices and applications to topology optimization of photonic components. Comput Math Model 2021; 32: 438–452.

22.

Lee

. Programming and Engineering Computing with Matlab. SDC Publication USA, 2021. 532 p.

23.

Noon

. Numerical analysis of least-squares group finite element method for coupled Burger’s problem. Baghdad Sci J 2021; 18: 1521.

24.

Rasheed

Kadhim

. Numerical solutions of two-dimensional vorticity transport equation using Crank-Nicolson method. Baghdad Sci J 2022; 19: 321–328.

25.

Maharani

A Bakar

Salhi

. RMEIEMLA: the recent advance in improving the robustness of Lanczos-type algorithms. AIP Conf Proc. 25–28 March; 2138 (1): 030009, 2009.

26.

Qiao

Tang

. Numerical Solution of Differential Equations: Introduction to Finite Difference and Finite Element Methods. 1st ed. UK: Cambridge University Press, 2018. 305p.

Hybrid Lanczos-switching models for solving large linear systems and their parallel versions

Abstract

Keywords

Introduction

Switching between Lanczos-type algorithms: Traditional switching strategy

Proposed switching models

Switching model based on last iterate with unfixed number of iterations

Switching model based on iterate with the minimum residual norm for unfixed number of iterations

Switching model from the minimum of the minimum residual norms

Results and discussion

The theoretical results

Numerical results

Comparison of traditional switching and the proposed models

Parallel of switching models

Comparison of SLUMoM and pSLUMoM

Discussion

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References