Solvability and optimal stabilization controls of discrete-time mean-field stochastic system with infinite horizon

The paper addresses the optimal control and stabilization problems for the indefinite discrete-time mean-field system over infinite horizon. Firstly, we show the convergence of the generalized algebraic Riccati equations (GAREs) and establish their compact form GARE. By dealing with the GARE, we derive the existence of the maximal solution to the original GAREs along with the fact that the maximal solution is the stabilizing solution. Then, the maximal solution is employed to design the linear-quadratic (LQ) optimal controller and the optimal value of the control problem. Specifically, we deduce that under the assumption of exact observability, the mean-field system is L2-stabilizable if and only if the GAREs have a solution, which is also the maximal solution. By semi-definite programming (SDP) method, the solvability of the GAREs is discussed. Our results generalize and improve previous results. Finally, some numerical examples are exploited to illustrate the validity of the obtained results.


Introduction
We are curious about the indefinite discrete-time mean-field LQ (MF-LQ) optimal control problems over infinite horizon. The system equation is the following stochastic difference equation (SDE): x k+1 = (Ax k +ĀEx k + Bu k +BEu k ) + (Cx k +CEx k + Du k +DEu k )w k , k = 0, 1, . . . , where x k ∈ R n and u k ∈ R m are the state, control processes, respectively. E is the expectation operator and {w k } k≥0 is a martingale difference sequence, defined on a complete filtered probability space (Ω, F, {F k } k≥0 , P), in the sense that E[w k |F k ] = 0, E[w 2 where F k is the σ -field generated by {ξ , w 0 , . . . , w k-1 }, F 0 = {∅, Ω}. The initial values ξ and w k are assumed to be independent of each other. A,Ā, C,C ∈ R n×n and B,B, D,D ∈ R n×m are given deterministic matrices. Indeed, system (1.1) is also regarded as a mean-field SDE (MF-SDE). Mean-field theory has been developed to investigate the collective behaviors owing to individuals' mutual interactions in various physical and sociological dynamical systems. This problem combines the mean-field theory with the LQ stochastic optimal control (see [1,2]). Lately, the mean-field problems have made many constructive and significative applications in various fields of mathematical finance, statistical mechanics, games theory (see [3]), especially in stochastic optimal control (see [4]). Some representative works in the mean-field optimal control, to name a few, include Li and Liu [5], Ma and Liu [6][7][8]. It is noteworthy that the optimal control problems of MF-LQ have received considerable attention. With regard to continuous-time cases, Yong [9] studied LQ optimal control problems for mean-field stochastic differential equations by variational method and decoupling technique; the same author in [10] systematically investigated the open-loop and closedloop equilibrium solutions for the time-inconsistency MF-LQ optimal control problems. Subsequently, Huang et al. [11] generalized the results of Yong [9] to infinite horizon.
Nevertheless, discrete-time optimal control problems are more relevant to biomedical, engineering, economic, operation research and optimizing complex technological problems, etc. Recently, Elliott et al. [12] formulated the finite horizon discrete-time MF-LQ optimal control problem as an operator stochastic LQ optimal control problem. Later, the same authors in [13] discussed the infinite horizon case. Ni et al. [14] considered the indefinite mean-field stochastic LQ optimal control problem with finite horizon. Moreover, Song and Liu [15] derived the necessary and sufficient solvability condition of the finite horizon MF-LQ optimal control problem. Specially, here it is worth mentioning that Zhang et al. [16] presented the necessary and sufficient stabilization conditions of the MF-LQ optimal control problem subject to system (1.1). Nevertheless, the stabilization results in [16] mainly rely on a restrictive condition, namely Indeed, it is a critical condition to study the MF-LQ optimal control problems. It is, therefore, natural to ask whether similar results can be derived if Q,Q, R,R are just assumed to be symmetric, which is of particular and significant mathematical interest. Inspired by the above arguments, in this paper we consider the following cost functional subject to system (1.1): x k Qx k + (Ex k ) Q Ex k + u k Ru k + (Eu k ) R Eu k + 2x k Gu k + 2(Ex k ) Ḡ Eu k . (1.2) Here, the cost functional contains the explicitly correlative terms of state and control processes, namely G = 0 andḠ = 0. More importantly, the weighting matrices Q,Q, R,R are just symmetric, which is distinctly different from [16].
To the best of our knowledge, the study on the necessary and sufficient stabilization conditions of the discrete-time MF-LQ stochastic optimal control problems, especially with indefinite weighting matrices, is fairly scarce in the literature. Besides, the stabilization properties have not been investigated systematically. In this paper, we design the LQ optimal controller by means of the GAREs (see [17]), and we obtain the existence of the maximal solution to the original GAREs by introducing another one. Then, we show that the stabilizing solution is the maximal solution which is employed to present the optimal value. Furthermore, under the assumption of exact observability (see [18,19]), we derive that the mean-field system is L 2 -stabilizable if and only if the GAREs have a solution, which is also a maximal solution. Finally, we discuss the solvability of the GAREs by SDP method (see [20][21][22][23]), and we establish the relations among the GAREs, the SDP, and the MF-LQ optimal control problems.
The remainder of the paper is organized as follows. The next section gives the problem formulation and preliminaries. Section 3 is devoted to studying the GAREs. In Sect. 4, we discuss the solvability and stabilization for the MF-LQ optimal control problems. Section 5 establishes the relations among the GAREs, the SDP, and the MF-LQ optimal control problems. A couple of numerical examples are given in Sect. 6 to illustrate our main results. Section 7 gives some concluding remarks.
Most notations adopted in the paper are considerably standard as follows. A > 0/A ≥ 0 means A is strictly positive definite/ positive semi-definite. A denotes the transpose of any matrix or vector A. B -1 stands for the inverse of real matrix B. dim(A)/R(A)/Ker(A) is the dimension/rank/kernel of A. T n represents the n × n symmetric matrix. Denote by X the space of all R n -valued square-integrable random variables. Let N = {0, 1, . . . , N},

Problem formulation and preliminaries
In this paper, we study the infinite horizon MF-LQ optimal control problems. Indeed, to make the problems meaningful, the infinite horizon solution also requires to guarantee the closed-loop stability, which is interestingly different from the finite horizon cases. We firstly introduce the admissible control set The infinite horizon MF-LQ optimal control problem to be solved can be stated as follows.
where u * is called an optimal control, and V (·) is called the value function of Problem A. Definition 2.1 For a matrix A ∈ R n×m , the Moore-Penrose inverse of A is defined to be the unique matrix A † ∈ R m×n such that is L 2 -asymptotically stable. In this case, u k = Kx k +KEx k (k ∈Ñ) is called the closed-loop L 2 -stabilizer.

Definition 2.4
Consider the uncontrolled mean-field system . System (2.1) (or (A,Ā, C,C, Q 1/2 ), for short) is said to be exactly observable if, for any N ≥ 0, In what follows, we make two assumptions.
We establish the following maximum principle which is the base to deriving the main results. Define where P N+1 ,P N+1 ∈ T n . The corresponding admissible control set is given as |u k | 2 < ∞ and J N < ∞ .
Remark 2.1 Compared with most of previous works, the maximum principle for MF-LQ optimal control problem was based on the mean-field backward stochastic differential equation (see [25,26]), while Proposition 2.1 provides a convenient calculation method and can be reduced to the standard stochastic LQ case.

2)
where ρ (i) , M (i) are given by Proof For any (P,P) ∈ Γ , define the new GAREs (NGAREs) The corresponding new cost functional is given bȳ To make the time horizon N specific in the finite horizon MF-LQ problem, we denote T k , where x l = ξ . According to (3.5), for l 1 < l 2 , we get Seeing the time-invariance of the coefficient matrices, it is obtained that T l (N) = T 0 (Nl),T l (N) =T 0 (Nl). Combining (3.5) with (3.6), for any x 0 = ξ ∈ X and any stabilizing controller u k = Lx k +LEx k , it follows that where c > 0 is a constant. Selecting x 0 ∈ R n , we claim that, for any N , l, Meanwhile, let x 0 = ϕδ with ϕ ∈ R n and P(δ = -1) = P(δ = 1) = 1 2 . By virtue of (3.8), we obtain (3.10) From (3.7) and (3.9)-(3.10), we get Here, C 1 ,C 1 are bounded. Taking l → -∞ and letting P k = T k +P,P k =T k +P, then P k , P k increase with respect to k, and (P k ,P k ) converges to (P,P) with (P,P) satisfying GAREs (3.2).

Remark 3.1 In view of the regular conditions in (3.3), we have
Besides, Then the optimal controller is designed as u k = L k x k +L k Ex k , where L k ,L k are given as (3.11), and the optimal value is derived bȳ N > 0. Using the maximum principle in Proposition 2.1, it follows that Adding from k = l + 1 to k = N on both sides of the above equation, we obtain Furthermore, Now we prove α (1) l > 0 and α (2) l > 0. Let x l = 0, then for any u l = 0,J(l) > 0. Following the discussion of (3.12), it implies that α (1) l > 0 and α (2) l > 0. Namely, α (1) The proof is completed.
Generally speaking, the uniqueness of the solution to the GAREs is not guaranteed. Next, we shall focus on the properties of the maximal solution and its relation with the stabilizing solution.
By Definition 3.2, it is clear that the maximal solution must be unique if it exists.

Now, we introduce a compact form of GAREs (3.2). LetP
By considering GARE (3.13), we can immediately show the existence of the maximal solution to GAREs (3.2). Proof (i) It is clear.
Problem B Find a F k -measurable u k to minimize the cost functional (1.2), simultaneously, to stabilize the mean-field system (1.1).   (3.2), which is also the maximal solution. In this case, the optimal stabilizing solution and the optimal value can be designed as (4.1)-(4.2), respectively.
According to Remark 3.1, we have Q ≥ 0,Q ≥ 0; further, it is obtained that Adding from 0 to N on both sides of (4.8), then Consequently, Resorting to the method of [31, Theorem 4, Proposition 1], we get that the exact observability of system (A,Ā, C,C, Q 1/2 ) is equivalent to the exact observability of the following system: Here, Thus, by (4.9), we get X 0 = 0, which is contradiction. In summary, T > 0 and T +T > 0 hold.
Step 4. From Theorem 3.1, we see P k (N) = T k (N) +P,P k (N) =T k (N) +P. Recall the convergences of P k (N),P k (N), then P = T +P,P =T +P. Besides, combining the arbitrariness ofP,P with T,T > 0, we derive P ≥P,P ≥P, namely (P,P) is the maximal solution to GAREs (3.2).
Sufficiency. Under (A2) andΓ = ∅, assume that GAREs (3.2) have a solution, we shall prove that system (1.1) is L 2 -stabilizable. Following the proof of the necessity, we claim that if GAREs (3.2) have a solution (P,P), then NGAREs (4.7) have a positive definite solution (T,T). In addition, P = T +P,P =T +P. Notice that K = L,K =L, the stabilization of system (1.1) with u k = Kx k +KEx k is equivalent to the stabilization of system (1.1) with u k = Lx k +LEx k . Together with Remark 3.1, (4.8) can be reformulated as (2) ) S (2) λ (2) x k -Ex k u k -Eu k ≥ 0, which implies that V T (k, x k ) is decreasing with respect to k. Along with V T (k, x k ) ≥ 0, we can deduce that V T (k, x k ) is convergent. Adding from m to m + N on both sides of the above equation and taking limitation, we get By a time-shift, it yields that We further obtain hence, lim m→+∞ E(x m x m ) = 0. Namely, system (1.1) is L 2 -stabilizable. Using Theorem 4.1, the optimal controller and optimal value can be designed as (4.1)-(4.2), respectively.
Remark 4.2 Our results extend and improve the ones in [16]. Besides, Theorem 4.2 makes it clear that the solvability of GAREs (3.2) with indefinite weighting matrices is equivalent to the solvability of NGAREs (4.7) with positive semi-definite weighting matrices. Simultaneously, it also indicates that the stabilization problems with indefinite weighting matrices can be reduced to a positive semi-definite case. So to speak, these conclusions will give us fresh ideas to consider the indefinite MF-LQ optimal control problems, especially to consider their stabilization problems.
Remark 4.3 Theorem 4.2 presents the necessary and sufficient stabilization condition for the indefinite MF-LQ optimal control problem, while for most of previous works, stabilization was the precondition for the indefinite control problems. In other words, their conclusions were only to discuss the existence of the stabilizing solution to the GAREs based on the assumption of the stabilization.

Characterizing MF-LQ problem via SDP
In this section, we present some results with respect to the SDP problem. Meanwhile, we establish the relations among the GAREs, the SDP, and the MF-LQ optimal control problems.
Definition 5.1 Let a vector a = (a 1 , a 2 , . . . , a m ) ∈ R m and matrices F 0 , F 1 , . . . , F m ∈ T n be given. The following optimization problem: min a x, is called a SDP. Besides, the dual problem of SDP (5.1) is defined as max -Tr(F 0 Z), subject to Z ∈ T n , Tr(ZF i ) = a i , i = 1, 2, . . . , m, Z ≥ 0.
(5.2) Theorem 5.1 A unique optimal solution is admitted to SDP problem (5.2), which is also the maximal solution to GAREs (3.2).
On the other hand, u * = K 1 x k + (K 2 -K 1 )Ex k is a stabilizing control. Following the proof of [11,Theorem 6.7], it follows that (P * ,P * ) is the upper bound of the set Γ ; in other words, (P * ,P * ) is the maximal solution. Furthermore, the uniqueness of the solution to SDP (5.2) follows from the maximality. The proof is completed. Besides, while either (i) or (ii) holds, GAREs (3.2) have a maximal solution (P * ,P * ), which is the unique optimal solution to SDP problem (5.2).
Here, (P * ,P * ) is the maximal solution to GAREs (3.2), which is the unique optimal solution to SDP problem (5.2). According to Theorem 4.2, we know that system (1.1) is not L 2 -stabilizable. A curve of E|x k | 2 under control (6.1) is shown in Fig. 2. As expected, the curve is not convergent. SinceΓ = ∅, by Theorem 4.2, there is a unique optimal controller to stabilize system (1.1); meanwhile, to minimize cost functional (1.2), the optimal controller is given as u k = -1.3333x k -0.1305Ex k , k ≥ 0. According to the optimal controller, the simulation of the system state and the curve of E|x k | 2 are shown in Fig. 3. As expected, the curve is convergent. In the above two cases,Γ = ∅. Furthermore, when P = -0.1604, we can get K = -0.1027, K = -0.5975. Similarly, when P = -0.5670, we get K = -1.2863,K = -19.0322. Thus, the controllers are presented as u k = -0.1027x k -0.5975Ex k , u k = -1.2863x k -19.0322Ex k , respectively. Simulation results for the curve of E|x k | 2 with the corresponding optimal controller are shown as in Fig. 4 and Fig. 5, respectively. As expected, the curves are not convergent.

Concluding remarks
We have investigated the exact observability of a linear stochastic time-invariant system in this work. How to extend various definitions to the linear stochastic time-varying system is a meaningful topic that merits further discussions. Compared with the time-invariant Figure 5 Curve of E|x k | 2 with initial state x 0 ∼ N(0, 1) system, defining the exact observability for the time-varying stochastic system is much more difficult and sophisticated. In addition, the necessary and sufficient stabilization conditions also deserve to be systematically studied. Thus, we attempt to discuss the linear time-varying system deeply in the future.