The steepest descent of gradient-based iterative method for solving rectangular linear systems with an application to Poisson’s equation

We introduce an effective iterative method for solving rectangular linear systems, based on gradients along with the steepest descent optimization. We show that the proposed method is applicable with any initial vectors as long as the coefficient matrix is of full column rank. Convergence analysis produces error estimates and the asymptotic convergence rate of the algorithm, which is governed by the term √ 1 – κ–2, where κ is the condition number of the coefficient matrix. Moreover, we apply the proposed method to a sparse linear system arising from a discretization of the one-dimensional Poisson equation. Numerical simulations illustrate the capability and effectiveness of the proposed method in comparison to the well-known and recent methods.


Introduction
Linear systems play an essential role in modern applied mathematics, including numerical analysis, statistics, mathematical physics/biology, and engineering. In this paper, we develop an effective algorithm for solving rectangular linear systems. The proposed algorithm can be applied for most of the scientific models involving differential equations. As a model problem, we concern Poisson's equation, which arises in many applications, for example, electromagnetics, fluid mechanics, heat flow, diffusion, and quantum mechanics.
Let us consider a (square) linear system Ax = b with given A ∈ M n (R) and b ∈ R n . Here we denote the set of m-by-n real matrices by M m,n (R), and for square matrices, we set M n (R) = M n,n (R). For solving linear systems, iterative methods have received much attention. In principle, iterative methods create a sequence of numerical solutions so that starting from an initial approximation, an iteration with sufficiently large number finally becomes an accurate solution. A group of methods for solving the linear system, called stationary iterative methods, can be expressed in the simple form where B is the associated iteration matrix derived from the coefficient matrix A, and c is the vector derived from A and b. The Jacobi method, the Gauss-Seidel (GS) method, and the successive over-relaxation (SOR) method are three classical ones (see, e.g., [1,Ch. 10]) derive by splitting where D is a diagonal matrix, and L (U) is a lower (upper) triangular matrix. The SOR method has received much attention and has been evolved continually into new iterative methods, for example: • Jacobi over-relaxation (JOR) method [2] B = D -1 αL + αU + (1α)D , c = αD -1 b, α > 0.
Another study of solving linear systems considers unconstrained convex optimization, where the gradient method along with the steepest descent is used (see, e.g., [27]). The steepest descent is a gradient algorithm where the step size α k is chosen at each individual iteration to achieve the maximum amount of decrease of the objective function. Suppose we would like to minimize a continuously differentiable function f on R n . To do this, let x k be the current iterate point, and let g k = ∇f (x k ) be the gradient vector at x k . The steepest descent method defines the next iteration by where α * k > 0 satisfies Barzilai and Borwein [28] approached the step size in the current iteration. For the iterative equation (4), the step size α k can be chosen as where s k-1 = x kx k-1 and y k-1 = g kg k-1 . We call such iterative method the BB method. Convergence analysis of the BB method is provided in [28,29]. This idea has encouraged and brought about many researches on the gradient method; see, for example, [30][31][32][33][34].
In the present paper, we propose a new gradient-based iterative algorithm with a sequence of optimal convergent factors for solving rectangular linear systems (see Sect. 2).
Then we make convergence analysis for the proposed algorithm, including the convergence rate and error estimates (see Sect. 3). Numerical experiments are provided to illustrate the capability and effectiveness of the proposed algorithm in comparison to all mentioned algorithms (see Sect. 4). We also apply the algorithm to a sparse linear system arising from a discretization of the one-dimensional Poisson equation (see Sect. 5). Finally, we conclude the paper with some remarks in Sect. 6.

Proposing the algorithm
In this section, we introduce a new method for solving rectangular linear systems based on gradients, and we provide an appropriate sequence of convergent factors that minimizes an error at each iteration.
Consider the rectangular linear system (1) where A ∈ M m,n (R) is a full-column-rank matrix, b ∈ R m is a known constant vector, and x ∈ R n is an unknown vector. We first define the quadratic norm-error function Since A is of full column rank, the consistent system (1) has a unique solution, and hence an optimal vector x * of f exists. We will start by having an arbitrary initial vector x(0), and then at every step k > 0, we iteratively move to the next vector x(k + 1) in an appropriate direction, that is, the negative gradient of f together with a suitable step size τ k+1 . Thus the gradient-based iterative method can be described through the following recursive rule: To minimize the function f , we will deduce their gradients in detail. Indeed, we get Thus our iterative equation is of the form To generate the best step size at each iteration, we recall the technique of the steepest descent, which minimizes the error occurring at each iteration. Thus we consider the error f (x(k + 1)) as a function of τ ≥ 0: To obtain the critical point, we make the differentiation: which gives τ = tr(bc T )/ tr(bb T ). Note that the second derivative of φ k+1 (τ ) is given by tr(bb T ), which is positive. Hence the minimizer of the function φ k+1 (τ ) is We call {τ k+1 } ∞ k=0 the sequence of optimal convergent factors. Now we summarize the search direction and optimal step size.

Algorithm 2.1
The steepest descent of gradient-based iterative algorithm.
Input step: Updating step: Set k := k + 1 and repeat the updating step.
Here we denote by c i the ith entry of c and by C ij the (i, j)th entry of C. In case of stopping the algorithm, a stopping criteria is necessary and can be described where is a small positive number. Note that we introduce the vectors c, d and the matrices C, D to avoid duplicated computations.

Convergence analysis
In this section, we show that Algorithm 2.1 converges to the exact solution for any initial vector. Moreover, we provide the convergence rate, error estimates, and the iteration number corresponding to a given satisfactory error.

Convergence of the algorithm
The convergence analysis is based on a matrix partial order and strongly convex functions. Recall that the Löwner partial order for real symmetric matrices is defined by Using the definition of the partial order , this is equivalent to In other words, m (resp., M) is a lower (resp., upper) bound for the smallest (resp., largest) eigenvalue of ∇ 2 f (x) for all x. (1) is consistent and A is of full column rank, then the sequence {x(k)} generated by Algorithm 2.1 converges to a unique solution for any initial vector x(0).

Theorem 3.2 If system
Proof The hypothesis implies the existence of a unique solution x * for the system. We will show that x(k) → x * as k → ∞. In case the gradient ∇f (x(k)) becomes the zero vector for some k, we have x(k) = x * , and the result holds. So assume that ∇f (x(k)) = 0 for all k. Since Thus f is strongly convex. For convenience, we write λ min and λ max instead of λ min (A T A) and λ max (A T A), respectively. We consider the function φ k+1 (τ ) of the step size τ . Applying Lemma 3.1, we obtain Minimize this inequality over τ . The right-hand side (RHS) is minimized by τ k+1 = 1/λ max , and thus From the other inequality in Lemma 3.1 we have We find that τ = 1/λ min minimizes the RHS, that is, Now ∇f (x(k)) 2 F ≥ 2λ min f (x(k)), and hence by (11) we have Since A is a full-column-rank matrix, the matrix A T A is positive definite, that is, λ > 0 for all λ ∈ σ (A T A). It follows that c := 1λ min /λ max < 1 and By induction we obtain that for any k ∈ N, which shows that f (x(k)) → 0 or, equivalently, x(k) → x * as k → ∞.

Convergence rate and error estimates
From now on, denote κ = κ(A), the condition number of A. According to the proof of Theorem 3.2, bounds (12) and (13) give rise to the following estimates: Moreover, the error estimates Ax(k)b F compared to the previous step and the first step are provided by (14) and (15), respectively. In particular, the relative error at each iteration gets smaller than the previous (nonzero) one. Now we recall the following properties.

Lemma 3.4 (e.g. [1]) For any matrices A and B of proper sizes, we have
In particular, the asymptotic convergence rate of the algorithm is governed by Proof By (15) and Lemma 3.4 we obtain Since the end behavior of this error depends on the term (1κ -2 ) k 2 , the asymptotic rate of convergence for the algorithm is governed by √ 1κ -2 . Similarly, from (14) and Lemma 3.4 we have and thus we get (16).
Hence the condition number κ of the coefficient matrix determines the asymptotic rate of convergence, as well as how far our initial point was from the exact solution. As κ gets closer to 1, the algorithm converges faster.

Proposition 3.6 Let {x(k)} ∞
k=1 be the sequence of vectors generated by Algorithm 2.1. For each > 0, we have Ax(k)b F < after k * iterations for any Besides, for each > 0, we have x(k)x * F < after k * iterations for any Proof From (13) This means precisely that for each > 0, there is a positive integer N such that for all k ≥ N , Taking the logarithm on both sides, we obtain (18). Another result can be obtained in a similar manner; here we start with approximation (17).
From Proposition 3.6 we obtain the iteration number such that the relative error Ax(k)b F and the exact error x(k)x * F have an accuracy to a decimal digit after iterations. Indeed, if p is a satisfactory decimal digit, then the desired iteration number is obtained by substituting = 0.5 × 10 -p .

Remark 3.7 A sharper bound for error estimation is obtained when the coefficient matrix
A is a square matrix. However, the convergence rate is governed by the same value. Indeed, the condition that A is of full column rank is equivalent to the invertibility of A. Using Lemma 3.4, we have the following bound: Since κ ≥ 1, these bounds are sharper than those in (16) and (17).

Numerical simulations for linear systems
In this section, we illustrate the effectiveness and capability of our algorithm. We report the comparison of TauOpt, our proposed algorithm, with the existing algorithms we have presented in the introduction, that is, GI (Proposition 1.1), LS (Proposition 1.2), BB1 (5), and BB2 (6). All iteration results have been carried out by MATLAB R2018a in Intel(R) Core(TM) i7-6700HQ CPU @ 2.60 GHz, RAM 8.00 GB PC environment. To measure the computational time taken for each program, we apply the tic and toc functions in MAT-LAB and abbreviate them CT. The readers are recommended to consider all reported results, such as errors, CTs, figures, while comparing the performance of any algorithms. For each example, unless otherwise stated, we consider the following error at the kth iteration step: We choose an initial vector x(0) = 10 -6 [1 -1] T . Running Algorithm 2.1, we see from Table 1 that the approximated solutions converge to the exact solution x * = [-3 4] T . Fig. 1 and Table 2 show the results when running 100 iterations. We can conclude that Algorithm 2.1 gives the fastest convergence.   Thus the condition number of the iteration matrix depends on a. By taking different values of a we then obtain the results shown in Table 3 and Fig. 2. The simulations reveal that the closer to 1 the condition number, the faster the convergence of the algorithm. This shows the correctness of Theorems 3.3 and 3.5.
Example 4. 3 We consider a larger linear system. We would like to show that for a coefficient matrix that has no appropriate property and makes all approximated solutions from As we can see from Fig. 3, the natural logarithm errors log x(k)x * F for Jacobi, GS, SOR, ESOR, AOR, and JOR diverge, whereas those for our method continue to converge to 0. As a result, the approximated solutions from Algorithm 2.1 converge to the exact solution x * = -1 -3 0 2 4 -6 T with six decimals accuracy using 14,612 iterations and CT = 0.3627. Example 4.4 In this example, we consider a rectangular linear system where its coefficient matrix is of full column rank. We compare Algorithm 2.1 with GI, LS, and BB algorithms. Let The exact solution of the system Ax = b is given by The results of running 100 iterations are provided in Fig. 4 and Table 4. Both show that Algorithm 2.1 outperforms the GI and LS algorithms. On the other hand, both types 1 and 2 of the BB algorithm of are comparable with ours. The BB algorithm gives a better iteration time; however, our algorithm gives a better computation time.   where M = tridiag(-1, 2, -1), N = tridiag(0.5, 0, -0.5), r = 0.01, and n = 16 as in [36]. We choose an initial vector x(0) = [x i ] ∈ R 100 , where x i = 10 -6 for all i = 1, . . . , 100. We take a random vector b ∈ R 100 with every element in [-100, 100]. Since the exact solution is not yet known, it is appropriate to consider the relative error b -Ax(k) F . The numerical results after 500 iterations in comparing our algorithm with GI and LS algorithm are shown in Table 5 and Fig. 5. They reveal that our algorithm performs better than GI and LS methods.

Application to one-dimensional Poisson's equation
We now discuss an application of the proposed algorithm to a sparse linear system arising from the one-dimensional Poisson equation Here u : (α, β) → R is an unknown function to be approximated, and f : (α, β) → R is a given function. The function u must satisfy the Dirichlet boundary conditions u(α) = u(β) = 0. We discretize the problem to solve an approximate solution at N partitioned points x i between α and β: By the centered 2nd-order finite difference approximation we obtain Now we can put it into a linear system T N u = h 2 f , where u = [u 1 u 2 . . . u N ] is an unknown vector, and the coefficient matrix T N = tridiag(-1, 2, -1) ∈ M N (R) is a tridiagonal matrix. Now the proposed algorithm for this sparse linear system is presented as follows. Updating step: 2 , u(k + 1) = u(k) + τ k+1 (s -Su(k)). Set k := k + 1 and repeat the updating step.
Here the stopping criteria is g -T N u(k) F < , where is a small positive number. Since the coefficient matrix T N is a sparse matrix, the error norm can be described more precisely: The eigenvalues of T N are given by λ j = 2(1 -cos jπ N+1 ) for j = 1, . . . , N ; see, for example, [1]. The smallest eigenvalues of T N can be approximated by the second-order Taylor approximation:

Figure 6
The analytical solution (left) and the numerical solution (right) for Ex. 5.3 Thus T N is positive definite with condition number Now, according to Remark 3.7, the convergence analysis of Algorithm 5.1 can be described as follows.

Corollary 5.2
The discretization T N u = h 2 f of the Poisson equation (20) can be solved using Algorithm 5.1 so that the approximated solution u(k) converges to the exact solution u * for any initial vector u(0). The asymptotic convergence rate of the algorithm is governed by √ 1κ -2 , where κ is given by (21).
Thus the convergence rate of the algorithm depends on the number N of partition.
Example 5. 3 We consider an application of our algorithm to the one-dimensional Poisson equation (20) with f (x) = x 2 -2 sin x -4x cos x, 0<x < π, and u = 0 on the boundary of [0, π]. We choose an initial vector u(0) = 2 × ones, where ones is the matrix that contains 1 at every position. We run Algorithm 5.1 with 8 partitioned points, so that the size of the matrix T N is 64 × 64. The analytical solution is u * (x) = x 2 sin x. Figure 6 shows the result of our algorithm (right) compared to the analytical solution (left) after running 1000 iterations with CT = 0.0112 seconds.

Conclusion
A new algorithm, the steepest descent of gradient-based iterative algorithm, is proposed for solving rectangular linear systems. The algorithm is applicable for any rectangular linear systems and any initial points without any conditions, but the coefficient matrix is of full column rank. We use an optimization technique to obtain a new formula for a convergence factor, so that it excellently enhances the algorithm in performance of convergence. The asymptotic rate of convergence is governed by √ 1κ -2 , where κ is the condition number of the coefficient matrix. The numerical simulations in Sect. 4 illustrate the applicability and efficiency of the algorithm compared to all other algorithms mentioned in this paper. The iteration number and the CT indicate that our algorithm is a good choice for solving linear systems. Moreover, the sparse linear system arising from the one-dimensional Poisson equation can be solved efficiently using this algorithm. In our opinion, the techniques of gradients, steepest descent, and convex optimization might be useful for a class of matrix equations such as Lyapunov equation, Sylvester equation, and so on. However, these topics require more studies and can be another further research.