Gradient-descent iterative algorithm for solving a class of linear matrix equations with applications to heat and Poisson equations

In this paper, we introduce a new iterative algorithm for solving a generalized Sylvester matrix equation of the form ∑p t=1 AtXBt = C which includes a class of linear matrix equations. The objective of the algorithm is to minimize an error at each iteration by the idea of gradient-descent. We show that the proposed algorithm is widely applied to any problems with any initial matrices as long as such problem has a unique solution. The convergence rate and error estimates are given in terms of the condition number of the associated iteration matrix. Furthermore, we apply the proposed algorithm to sparse systems arising from discretizations of the one-dimensional heat equation and the two-dimensional Poisson’s equation. Numerical simulations illustrate the capability and effectiveness of the proposed algorithm comparing to well-known methods and recent methods.


Introduction
Linear matrix equations have played a crucial role in control theory and differential equations; see, e.g., [1][2][3][4]. There was much attention given to the following matrix equations: the equation AXB = C, the Sylvester equation AX +XB = C, the Kalman-Yakubovich equation AXB + X = C, and, more generally, the equation AXB + CXD = F. Using the notions of the matrix Kronecker product and the vector operator, we can obtain their exact solutions. However, matrices with high dimensions (e.g., A, B of size 10 2 × 10 2 ) cause their Kronecker product dimension to be very high (10 4 × 10 4 , in that case). The dimension problem leads to a computational difficulty due to exceeding computer memory when computing an inverse of the large matrix.
In 2005, Ding and Chen applied the hierarchical identification principle to develop the gradient-based iterative (GI) algorithms for solving the equation p j=1 A j XB j = C, which includes the Sylvester equation, as follows. Proposition 1.1 ([32]) If the matrix equation p j=1 A j XB j = C has a unique solution X, then the iterative solution X(k) obtained from the gradient-based iterative (GI) algorithm given by X(k) = X 1 (k) + X 2 (k) + · · · + X p (k) /p, converges to the solution X.
In 2008, Ding, Liu, and Ding derived the following three iterative methods for the equation AXB = C, and the equation is such that X(k) → X * . Proposition 1.3 ([33]) If the equation AXB = C has a unique solution X * , then the least squares (LS) iterative algorithm, is such that X(k) → X * .

Proposition 1.4 ([33]) If the matrix equation
p j=1 A j XB j = F has a unique solution X, then the iterative solution X(k) obtained from the least-squares-iterative (LSI) algorithm given by where 0 < μ < 2p, converges to the solution X.
In this paper, we introduce a gradient-descent iterative algorithm for solving the generalized Sylvester equation that takes the form Note that this equation includes all mentioned matrix equations as special cases. The obtained algorithm is based on the vector representation and the variants of the previous works in [32,33]. The algorithm aims to minimize an error at each iteration by the idea of gradient-descent. We show that the proposed algorithm can be applied to any problems with any initial matrices as long as such problem has a unique solution. The convergence rate and error estimates are given in terms of the condition number of the associated iteration matrix. Numerical simulations reveal that our proposed algorithm performs well compared to the mentioned iterative methods. Moreover, our algorithm can be employed to a discretization of famous partial differential equations namely, the one-dimensional heat equation and the two-dimensional Poisson's equation. Both equations are widely used in many areas of theoretical physics, electrostatic and mechanical engineering; see, e.g., [37] and [38]. According to our numerical results, the algorithm is applicable to both heat and Poisson's equations comparing to their analytical solutions. The outline of this paper is as follows. In Sect. 2, we supply auxiliary tools to solve linear matrix equations and to make a convergence analysis of an iterative method for solving such equations. We propose new algorithms for the equations AXB = C and p t=1 A t XB t in Sects. 3 and 4, respectively. In Sect. 5, we presented numerical simulations for various kinds of the linear matrix equations. In Sects. 6 and 7, we apply our algorithm to the onedimensional heat equation and the two-dimensional Poisson's equation, respectively. The numerical simulations for heat and Poisson's equations are provided in their own sections. Finally, we present a conclusion in Sect. 8.

Preliminaries on matrix analysis
Throughout this paper, all considered matrices are real. Denote the set of m×n matrices by M m,n . When m = n, we write M n instead of M n,n . Let I be an identity matrix of compatible dimension. The (i, j)th entry of a matrix A is denoted by A(i, j) or a ij .
Recall the Löwner partial order for real symmetric matrices: The Kronecker (tensor) product of A = [a ij ] ∈ M m,n and B ∈ M p,q is defined by The vector operator is defined for each A = [a ij ] ∈ M m,n by Vec(A) = a 11 · · · a m1 a 12 · · · a m2 · · · a 1n · · · a mn T .
It is clear that the vector operator is linear and injective.

Lemma 2.1 ([39])
The Kronecker product and the vector operator posses the following properties provided that all matrices are compatible: To perform convergence analysis, the spectral norm, the Frobenius norm, and the condition number of A ∈ M m,n are used and respectively defined by We recall the following properties:

The equation AXB = C
Consider the matrix equation where A ∈ M p,m has full column-rank, B ∈ M n,q has full row-rank, C ∈ M p,q is a known constant matrix, and X ∈ M m,n is unknown. The hypotheses imply the invertibility of A T A and BB T , and thus we obtain the unique solution to be However, to compute (A T A) -1 and (BB T ) -1 requires a large amount of data storage if the sizes of matrices are large. Thus, in this section, we shall propose a new iterative method to solve (2) based on gradients and the steepest descend which provides an appropriate sequence of convergent factors for minimizing an error at each iteration. Moreover, the discussion in this section leads to a treatment for a general matrix equation in Sect. 4.

Proposed algorithm
We consider the Frobenius norm-error AXB -C F which can be equally transform into (B T ⊗ A) Vec(X) -Vec(C) F via Lemma 2.1(iii). So, we define the quadratic norm-error function f : R mn → R by We know that a norm function is a convex function, so f is convex. We assume that the exact solution X * of (2) is uniquely determined, hence an optimal matrix X * of f exists. We start by having an arbitrary initial matrix X(0) and then at every step k > 0 we iteratively move to the next matrix X(k + 1) along an appropriate direction, i.e., the negative gradient of f , together with a suitable step size. In the kth step, the step size τ k+1 is changed appropriately in order to incur the minimum error. The gradient-descent iterative method thus can be described through the following recursive rule: In order to do that, we recall the following gradient formula: Now, we find the gradient of function f and deduce its derivatives in detail. Letting S = B T ⊗ A, x = Vec(X), and c = Vec(C), we have Thus, our new iterative equation is in the form Using Lemma 2.1, we have Next, we choose a step size. To generate the best step size at each iteration, we minimize an error which occurs at the next iteration, X(k + 1). Then, for each k ∈ N ∪ 0, we define Now, we shall minimize the function φ k+1 (τ ) by applying the properties of matrix trace. Before that, we may transform φ k+1 (τ ) into a convenient form by letting c = c -Sx and b = SS T c, so that Differentiating both sides, we have Note that the second derivative of φ k+1 (τ ) is the constant tr(bb T ), which is positive. Setting dφ k+1 (τ )/dτ = 0 and using Lemma 2.1(iii), we obtain the minimizer of φ k+1 (τ ) as follows: Summarizing the direction and the step size altogether, we get: The gradient-descent iterative algorithm for solving (2). Initialization step. Given any small error > 0, choose an initial matrix X(0). Set k := 0.
Set k := k + 1 and return to Stopping rule.
Remark 3.2 In Algorithm 3.1, we introduce the matrices A, B, and E(k) to avoid duplicate manipulations. The term E(k) or E(α, β) in the denominator of the formula of τ k+1 does not cause a severe propagation of errors when X(k) is close to the exact solution. This is because the Stopping rule prevents E(α, β) from being a very small number, and there is also the term E(α, β) in the numerator. A similar comment is applied to any developed algorithms in this paper.

Convergence of the algorithm
Here, we will prove that Algorithm 3.1 converges to the exact solution. The following analysis will hold for strongly convex functions. Recall that a twice-differentiable convex (2) is consistent and has a unique solution X * , then the iterative sequence {X(k)} generated by Algorithm 3.1 converges to X * for any initial matrix X(0), i.e., X(k) → X * as k → ∞.

Theorem 3.4 If
Proof If ∇f (Vec(X(k))) = 0 for some k, then X(k) = X * and the result holds. So assume that ∇f (Vec(X(k))) = 0 for all k. To investigate its convexity, let us find the second derivative.
Indeed, we have from (4) and Lemma 2.1 that For convenience, we write λ min and λ max instead of λ min (BB T ⊗A T A) and λ max (BB T ⊗A T A), respectively. Since BB T ⊗ A T A is symmetric, we have meaning that f is strongly convex. Considering φ k+1 (τ ) = f (Vec(X(k + 1))) and applying (6) in Lemma 3.3, we obtain The right-hand side is minimized by τ k+1 = 1/λ max , and It follows from (5) that We find that τ = 1/λ min minimizes the RHS of (8), i.e., Substituting (9) into (7) and then putting c := 1λ min /λ max , we have We obtain inductively that Since A has full column-rank and B has full row-rank, BB T ⊗ A T A is invertible. It follows that BB T ⊗ A T A is positive definite, which implies λ min > 0 and thus 0 < c < 1. Hence, f (Vec(X(k))) → 0 as k → ∞.

The equation p t=1 A t XB t
In this section, we consider the generalized Sylvester equation where for each t = 1, . . . , p A t ∈ M q,m is a full column-rank matrix, B t ∈ M n,r is a full rowrank matrix, C ∈ M q,r is a known constant matrix, and X ∈ M m,n is an unknown matrix. An equivalent condition for (10) to have a unique solution is that P = p t=1 B T t ⊗ A t is invertible. Its unique solution is given by We shall introduce a new iterative method for solving (10) based on gradients and the steepest descend which provides an appropriate sequence of convergent factors for minimizing an error at each iteration.

Proposed algorithm
We define the quadratic norm-error functionf : It is obvious thatf is convex. For convenience, we let P = p t=1 B T t ⊗ A. We assume that P is invertible, then the exact solution exists. The gradient-descent iterative method therefore can be described through the following recursive rule: To search for the direction, we use the same techniques as in the previous section and then obtain ∇f Vec(X) = P T P Vec(X) -Vec(C) .

Thus, our new iterative equation is in the form
Using Lemma 2.1, we obtain Next, we choose a step size. With the same technique as in the previous section, we minimizeφ : [0, ∞) → R by for each k = 0, 1, . . . ,φ k+1 (τ ) :=f (X(k + 1)). Similarly, the minimizer of functionφ k+1 (τ ) is Summarizing the direction and the step size altogether, we get: The gradient-descent iterative algorithm for solving (10). Initialization step. Given any small error > 0, choose an initial matrix X(0).
Set k := k + 1 and return to the Stopping rule.

Convergence analysis of the algorithm
In this subsection, we shall show that Algorithm 4.1 is applicable for any choice of the initial matrix X(0) as long as equation (10) has a unique solution. After that, we shall discuss error estimates and the asymptotic convergence rate of the algorithm. (10) is consistent and has a unique solution X * , or equivalently, P is invertible, then the iterative sequence {X(k)} generated by Algorithm 4.1 converges to X * for any initial matrix X(0), i.e., X(k) → X * as k → ∞.

Theorem 4.2 If
Proof Convergence of Algorithm 4.1 can be proved similarly as in Theorem 3.4. In this case, we have λ min P T P I ∇ 2f Vec X(k) = P T P λ max P T P I, which implies the strong convexity off . In a similar manner, we get wherec := 1λ min (P T P)/λ max (P T P). By induction, we obtaiñ f Vec X(k) ≤c kf Vec X(0) .
The uniqueness of the solution implies that P is positive definite, and thus 0 <c < 1. Hence, f (Vec(X(k))) → 0 as k → ∞.
From now on, we denote κ = κ(P), the condition number of P. Observe thatc = 1κ -2 . According to Lemma 2.1(iii), the bounds (12) and (13) give rise to the following estimates: Since 0 <c < 1, it follows that if We can summarize the above discussion as follows: Moreover, the error estimates p t=1 A t X(k)B t -C F compared to the previous iteration and the first iteration are provided by (14) and (15), respectively. In particular, the relative error at each iteration gets smaller than the previous (nonzero) error, as in (16).
In particular, the convergence rate of the algorithm is governed by Proof Utilizing equation (15) and Lemma 2.2, we have Since the asymptotic behavior of the above error depends on the term (1κ -2 ) k 2 , the asymptotic convergence rate for the algorithm is governed by √ 1κ -2 . In a similar manner but making use of (14) instead of (15), we obtain and hence (17) is reached.
Thus, the condition number κ determines the asymptotic convergence rate, as well as how far our initial matrix was from the exact solution. The closer κ gets to 1, the faster the algorithm converges to the required result.

Numerical simulations for a class of the generalized Sylvester matrix equations
In this section, we present applications of our proposed algorithms to the certain linear matrix equations. To show the effectiveness and capability of our algorithms, we compare our proposed algorithms to the mentioned existing algorithms as well as the direct methods (3) and (11). For convenience, we abbreviate TauOpt to represent our algorithms. To measure the computational time taken for each program, we apply the tic and toc functions in MATLAB and abbreviate CT for it. The readers are recommended to consider all reported results, such as errors, CTs, figures, while comparing the performance of any algorithms. To measure the error at the kth step of the iteration, we consider the following error: All iterations have been carried out by MATLAB R2018a, Intel(R) Core(TM) i7-6700HQ CPU @ 2.60 GHz, 8.00 GB RAM, PC environment. We choose the initial matrix X(0) = 10 -6 ones(3, 3) where ones(m, n) denotes the m × n matrix with contains 1 at every position. After running Algorithm 3.1, the numerical so-  In this example, we compare Algorithm 3.1 with GI (Proposition 1.2) and LS (Proposition 1.3). All reports are presented after running 100 iterations. Table 1 shows the errors at the final iteration as well as the computational time. Figure 1 displays the error plot. Table 1 implies that our algorithm takes significantly less computational time than the direct method. For comparison to other two algorithms, it seems that our algorithm takes a little more time but both Table 1 and Fig. 1 indicate that ours obtains a highly satisfactory approximated solution. We compare Algorithm 4.1 with the following algorithms: GI [32], RGI [34], MGI [35], JGI (Algorithm 4, [36]) and AJGI (Algorithm 5, [36]). The results after running for 100 iterations are shown in Fig. 2 and Table 2. According to the error and CT in Table 2 and    Figure 3 Errors for Example 5.3 We choose an initial matrix X(0) = 10 -6 ones(3, 3). After running Algorithm 4.1, the numerical solutions converge to the exact solution In this example, we compare Algorithm 4.1 with GI (Proposition 1.1) and LSI (Proposition 1.4). The results after 100 iterations are shown in Fig. 3 and Table 3. We find that Algorithm 4.1 gives the fastest convergence.

An application to a discretization of one-dimensional heat equation
In this section, we apply our proposed algorithm to a discretization of one-dimensional heat equation: subject to the boundary conditions u(α x , t) = g l , u(β x , t) = g r , u(x, 0) = g d where g l , g r , g d are given functions.

Discretization of the heat equation
We make discretization at the grid points in the rectangle which are at ( We denote u ij = u(x i , t j ). By the Forward Time Central Space (FTCS) method, we obtain or equivalently, We transform equation (19) into a linear system of N x N t equations in N x N t unknowns u 11 , . . . , u N x N t : where U = [u ij ], T H ∈ M N x N t has N t × N t blocks of the form I N x on its diagonal and tridiag(-F, -(1 -2F), -F) under its diagonal. Here is an example of T H where N t = 3 and N x = 2: Equation (21) is formed as AXB = C where A = T H , X = Vec(U), B = I and C = V . According to Algorithm 3.1, we obtain an algorithm for (21) as follows: The gradient-descent iterative algorithm for solving one-dimensional heat equation.
Input step. Input N x , N t ∈ N as numbers of partition. Initialization step. Let h x and h t be as in (20) and, for each i = 1, . . . , N x and j = 1, . . . , N t , x i = α x + ih x and t j = jh t , compute s = T H V , S = T 2 H , s = T H s, and S = T H S. Choose u(0) ∈ R N x N t and set k := 0. Updating step. Compute Set k := k + 1 and repeat the Updating step.
Here, we denote s p the pth entry of a vector s and S pq the (p, q)-entry of S. To stop the algorithm, an appropriate stopping rule is V -T H u(k) 2 F < where is a small positive number.

Numerical simulation for the heat equation
To obtain the numerical solutions, we need to partition the rectangular domain. The accuracy of the solution depends on the size of the partition grid. A better accuracy must be from a finer grid system and it causes the size of the associated matrix T H to be larger. Let c = 1, N x = 4, h t = 0.01. We have h x = 0.2 and F = 0.25. In this case, we consider N t = 10, so the size of the matrix T H is 40 × 40. We run Algorithm 6.1 with the initial vector u(0) = 10 -6 [1 · · · 1] T and the numerical solutions converge to the exact solution u * (x, t) = e -π 2 t sin(πx).

An application to a discretization of two-dimensional Poisson's equation
In this section, we give an application of the proposed algorithm to a discretization of two-dimensional Poisson's equation: with the boundary conditions u(x, β y ) = g u , u(x, α y ) = g d , u(α x , y) = g l , and u(β x , y) = g r where g u , g d , g l , g r are given functions. Notice that the two-dimensional Laplace's equation is a homogeneous case of the Poisson's equation when the RHS function is zero, i.e., f (x, y) = 0.

Discretization with rectangular grid
We make discretization at the grid points in the rectangle which are at (x i , y j ) with x i = α x + ih x and y j = α y + jh y where , as well as g u , g d , g l , g r . By the standard finite difference approximation, we obtain or equivalently, for 1 ≤ i ≤ N x , 1 ≤ j ≤ N y . Now, we can convert the differential equation (22) to a linear system of N x N y equations in N x N y unknowns u 11 , . . . , u N x N y : where U = [u ij ], T 1 has N y × N y blocks of the form tridiag(-1, 2, -1) of N x × N x on its diagonal and T 2 also has N y × N y blocks of the form 2I N x on its diagonal and -I N x blocks on its off-diagonals. The boundary conditions produce constant vectors g u , g d , g l , g r at the RHS of (25) as follows: g u = g u x 1 ,βy g u x 2 ,βy · · · g u x Nx ,βy 0 · · · 0 T , g d = 0 · · · 0 g d x 1 ,αy g d x 2 ,αy · · · g d x Nx ,αy T , g l = g l αx,y Ny 0 · · · 0 g l αx,y Ny-1 0 · · · g l αx,y 1 0 · · · 0 T , g r = 0 · · · 0 g r βx,y Ny 0 · · · 0 g r βx,y Ny-1 0 · · · g r βx,y 1 T . Note that, for the Laplace's equation, equation (25) will be reduced to Equation (25) is formed as According to Algorithm 3.1, we obtain an algorithm for the rectangular-grid case as follows: The gradient-descent iterative algorithm for solving two-dimensional Poisson's equation.
Input step. Input N x , N y ∈ N as numbers of partition. Initialization step. Let h x and h y be as in (23) and for each i = 1, . . . , N x and j = 1, . . . , N y , Set k := k + 1 and repeat the Updating step.
Here, we denote by s p the pth entry of a vector s and by S pq the (p, q)-entry of S. In case of solving the two-dimensional Laplace's equation, initially compute c = h 2 x (g u + g d ) + h 2 y (g l + g r ). To stop the algorithm, a reasonable stopping rule is c -T N u(k) 2 F < where is a small positive number. Since the coefficient matrix T N is sparse, the error norm can be described more precisely:

Discretization with square grid
Now, we consider the Poisson's equation ( where T r = tridiag(-1, 2, -1) ∈ M N . Thereby, (25) can be transformed into or equivalently, T r U + UT r = G where G = -h 2 Vec([f ij ]) + g u + g d + g l + g r . Thus (26) can be solved by Algorithm 4.1 where P = T N .
To have the condition number of T N , we consider the smallest and largest eigenvalues of T r which are given respectively by (see, e.g., [42]) . Thus, the condition number of T N for large N is

Corollary 7.2
The discretization (26) of the Poisson's equation (22) can be solved by using Algorithm 7.1 in which c = -h 2 Vec[f ij ] + g u + g d + g l + g r so that the approximate solution u(k) converges to the exact solution u * for any initial vector u(0). The convergence rate of the algorithm is governed by 1κ -2 T N , where κ T N is given by (27). Moreover, the error estimates are given as follows:

Numerical simulations for the Poisson's equation
Example 7. 3 We consider an application of our algorithm to the two-dimensional Poisson's equation (22) with f (x, y) = -2π 2 sin(πx) sin(πy), 0 < x < 1, 0 < y < 1, and the boundary condition u = 0 on the boundary of the rectangle. It is called a Dirichlet problem. We choose an initial vector u(0) = 10 -6 [1 · · · 1] T . We run Algorithm 7.1 with the rectangular grid of 10 × 20 which causes the size of the matrix T N to be 200 × 200. The analytical solution is u * (x, y) = sin(πx) sin(πy).
In this example, we provide only a comparison of numerical and analytical solutions in Table 6, and a 3D-plot of both solutions in Fig. 6. We run Algorithm 7.1 with the initial vector u(0) = [1 · · · 1] T . We choose two grid partitions: one has h x = 0.25, h y = π/4 and the other has h x = 0.0625, h y = π/32. So the sizes   of the matrix T N are 9 × 9 and 465 × 465, respectively. A comparison of numerical and analytical results is shown in Table 7. Figure 7 displays a 3D-plot of the numerical and the analytical results for the latter grid partition. Note that the analytical solution is u * (x, y) = e x sin y.

Conclusion
The proposed gradient-descent based iterative algorithm is well suited for solving the generalized Sylvester matrix equation, p t=1 A t XB t = C. Such matrix equation can be reduced to a class of well-known linear matrix equations such as the Sylvester equation, the Kalman-Yakubovich equation, and so on. The proposed algorithm is applicable for any problems as long as A t and B t have full column-rank and full row-rank, respectively, for all t. The convergence rate of the algorithm is governed by √ 1κ -2 where κ is the