Backward-forward linear-quadratic mean-field Stackelberg games

This paper studies a controlled backward-forward linear-quadratic-Gaussian (LQG) large population system in Stackelberg games. The leader agent is of backward state and follower agents are of forward state. The leader agent is dominating as its state enters those of follower agents. On the other hand, the state-average of all follower agents affects the cost functional of the leader agent. In reality, the leader and the followers may represent two typical types of participants involved in market price formation: the supplier and producers. This differs from standard MFG literature and is mainly due to the Stackelberg structure here. By variational analysis, the consistency condition system can be represented by some fully-coupled backward-forward stochastic differential equations (BFSDEs) with high dimensional block structure in an open-loop sense. Next, we discuss the well-posedness of such a BFSDE system by virtue of the contraction mapping method. Consequently, we obtain the decentralized strategies for the leader and follower agents which are proved to satisfy the ε-Nash equilibrium property.


Introduction
Recently, the dynamic optimization of a (linear) large-population system has attracted extensive research attention from academic communities. Its most significant feature is the existence of numerous insignificant agents, denoted by {A i } N i=1 , whose dynamics and (or) cost functionals are coupled via their state-average. To design low-complexity strategies for a large-population system, one efficient method is mean-field game (MFG) which enables us to derive the decentralized strategies. We recall that there is a large body of related works on MFG. Since the independent works by Huang, Caines, and Malhamé [11,12] and Lasry and Lions [13][14][15], MFG theory and its applications have enjoyed rapid growth. Some related further developments on MFG theory may include Bardi [1], Bensoussan, Frehse, and Yam [4], Carmona and Delarue [6], Garnier, Papanicolaou, and Yang [8], Guéant, Lasry, and Lions [9], and the references therein.
(Single leader-follower game) In the case where N = 1, only a single follower with one leader, our problem is reduced to the classical single-leader and single-follower game. The leader-follower (Stackelberg) game was proposed in 1934 by H. von Stackelberg [23] when he defined the concept of a hierarchical solution for markets in which some firms have more power than of others and thus dominate them. This solution concept is termed the Stackelberg equilibrium. An early study of stochastic Stackelberg differential games (SSDGs) was conducted by Basar [2]. Another relevant study was performed by Yong [26], where an LQ leader-follower stochastic differential game (SDG) was introduced and studied in its open-loop information case. The setting in [26] is general: its coefficients of system and cost functionals may be random, the controls enter the diffusion term of state dynamics, and the weight matrices for controls in cost functionals are not necessarily positive definite. In a similar but nonlinear setting, Bensoussan, Chen, and Sethi [3] obtained the global maximum principles for both open-loop (OL) and closed-loop (CL) SSDGs, but the diffusion term did not contain the controls. This simplifies the related analysis to a certain extent. In the special LQ setting, the solvability of related Riccati equations is also discussed, and the state feedback Stackelberg equilibrium is thus obtained.
So far, almost all of these related research studies for mean-field Stackelberg games have been based on the SDEs system state. To the best of our knowledge, the first paper that does some research on the BSDEs system state is that by Huang, Wang, and Wu [10]. This paper can be regarded as the follow-up work of that one. We formulate more general LQ mean-field Stackelberg games with BSDEs system state. Unlike the forward SDE with given initial condition, the terminal condition is pre-specified in the BSDE as a priori, and its solution becomes an adapted process pair. Linear BSDEs were first introduced by Bismut [5], and the general nonlinear BSDE was first studied in Pardoux and Peng [18]. The BSDE has been applied broadly in many fields such as mathematical economics and finance, decision making, and management science. One example is the representation of stochastic differential recursive utility by a class of BSDE (Wang and Wu [24], etc.). A BSDE coupled with an SDE in their terminal conditions formulates the forward-backward stochastic differential equation (FBSDE). The FBSDE has also been well studied, and the interested readers may refer to [7,[25][26][27][28].
The modeling of the leader agent by a BSDE and follower agents by a forward SDE is well motivated and can be illustrated by the following example. The government announces the target of interest-adjusted in future five years today. The related banks and individuals will try to find the optimal investment plan based on the announcement. However, the government learns that the related banks and individuals will carry out their own investment plans according to its announcement. So the government could adjust its announcement to optimize its own goal. This is a typical mean-field Stackelberg game with the leader agent modeled by a BSDE and follower agents modeled by a forward SDE. The model setting has its own strengths in applications. In practice, the leader always sets a goal or target for the group, and the followers in the group will find the optimal plan to achieve the goal. The cost functional they consider may differ and the dynamics of the leader becomes a BSDE and the dynamics of the followers are a series of SDEs. The traditional paper studies the leader-follower problems that are all based on SDEs dynamics and cannot represent this kind of cases in practice.
The modeling of backward-leader and forward-followers will yield a large-population system with backward-forward stochastic differential equation (BFSDE), which is struc-turally different to FBSDE in the following aspects. First, the forward and backward equations will be coupled in their initial rather than terminal conditions. Second, unlike FB-SDE, there is no feasible decoupling structure by standard Riccati equations, as addressed in Lim and Zhou (2001) [16]. This is mainly because some implicit constraints in the initial conditions should be satisfied in the possible decoupling.
The introduction of BFSDE also brings some technical differences to its MFG studies. It will bring a more complicated coupled structure to consistency condition derived in our current backward-leader and forward-followers setup. The standard procedure of MFG mainly consists of the following steps: Step 1: Fix the decision of the leader, denoted respectively by (x 0 , u 0 ). Given such fixed quantities (x 0 , u 0 ), introduce and solve the mean-field subgame faced by all followers which are also competitive inside their interaction cycle. For such a subgame, an auxiliary problem can be constructed and some decentralized responses of the followers can be derived, the related mass limit response of the followers is denoted byx =x(x 0 , u 0 ).
Step 2: Given the response functional of followersx, solve the decentralized stochastic control problem of the leader A 0 , and denote the optimal solution pair by (x 0 ,ū 0 ) = (x 0 (x),ū 0 (x)).
Step 3: Derive the consistency condition (CC) system to specifyx; then, all decentralized strategies for the leader and followers can sequentially be designed. An approximate Nash equilibrium can then be obtained.
The main contributions of this paper can be summarized as follows: • We formulate a general backward-leader and forward-followers LQ mean-field game.
To some degree, it has some applications in reality. • We derive the CC system which is represented using a fully coupled mean-field-type backward-forward stochastic differential equation (BFSDE) in an open-loop case. • The existence and uniqueness of the related CC system is investigated in global solvability case. The rest of this paper is organized as follows. Section 2 provides the problem formulation and presents some preliminary details. In Sect. 3, we introduce the auxiliary limiting LQG optimization problems for MFG analysis. In Sect. 4, we discuss the open-loop strategy of Stackelberg games. In Sect. 5, we determine the CC system based on an open loop, which provides fully coupled BFSDEs. Section 6 is devoted to verifying the approximate equilibrium of open-loop decentralized strategies.

Preliminaries and problem formulation
The following notations are used throughout this paper. Let R n denote the n-dimensional Euclidean space, R n×m be the set of all (n × m) matrices, and let S n be the set of all (n × n) symmetric matrices. We denote the transpose by subscript , the inner product by ·, · , and the norm by | · |. For t ∈ [0, T] and Euclidean space H, we introduce the following function spaces: and the spaces of process or random variables on a given filtrated probability space: On a given finite decision horizon [0, T], let ( , F, {F t } 0≤t≤T , P) be a complete filtered probability space on which a (1 + N)-dimensional standard Brownian motion {W 0 (t), W i (t); 1 ≤ i ≤ N} 0≤t≤T is defined. We define by {F t } 0≤t≤T the natural filtration generated by {W 0 (·), W i (·), x i0 ; 1 ≤ i ≤ N} augmented by all the P-null sets in F , it captures the full information of agents; {F w 0 t } 0≤t≤T is the natural filtration generated by {W 0 (·)} augmented by all the P-null sets in F , it captures the information of the leader agent; {F w i t } 0≤t≤T is the natural filtration generated by {W i (·)} augmented by all the P-null sets in F , it captures the information of the ith follower agent; {F i t } 0≤t≤T is the natural filtration generated by {W 0 (·), W i (·)} augmented by all P-null sets in F . In this paper, we consider a large-population system involving (1 + N) individual agents (where N is sufficiently large), which represent two types of agents: leader agent A 0 and follower agents are given sequentially by the following controlled linear backward stochastic differential equations (BSDE, for short) and controlled linear forward stochastic differential equations (SDE or FSDE, for short), respectively.
x 0 (T) = ξ , (2.1) and A i : are called the state average or mean field term of all follower agents; x i0 is the initial value of A i . In this paper, for simplicity, we assume the dimensions of state process and control process are both one-dimensional. Here, A 0 , We introduce the following assumption: (H1) The initial states x i0 are independent and identically distributed (iid, for short) It follows that (2.1) admits a unique adapted solution for all u 0 ∈ U 0 [0, T] (refer to Pardoux and Peng [18]). It is also well known that under (H1), (2.2) admits a unique adapted solution for all u i ∈ U i [0, T], 1 ≤ i ≤ N . Now, we can formulate the large population dynamic optimization problem.

Problem (I)
Find the optimal strategiesū = (ū 0 ,ū 1 , . . . ,ū N ), which satisfy We notice that all agents are coupled not only in their state process but also in their cost functionals with state averages. Roughly speaking, the game to be studied is carried out as follows. First, the leader A 0 announces his strategy u 0 (·) and commits to fulfilling it. Next, the followers A i provide their best response accordingly to minimize their cost functionals J i (u i (·), u -i (·)). This reduces some best response functionals for the followers depending on the control law of the leader. With this functional in mind, before the announcement, the agent A 0 will design his best response to minimize his own cost functional J 0 (u 0 (·), u -0 (·)). Notice the weak coupling among the agents in a large-population system, the above game problem is essentially a high-dimensional Stackelberg-Nash differential game. The influence of individual agents (leader or followers) on the population should be averaged out when population size tends to infinity.

The limiting optimal control problem
Let us introduce the auxiliary limiting LQG optimization problems. Firstly, as N → +∞, we suppose that x (N) (·) can be approximated by an F w 0 t -measurable functionx(·). Then the state process of the follower becomes with the following auxiliary cost functionals: Then, introduce the following auxiliary limiting LQG optimization problems for followers.

Problem (II)
For given x i0 , F w 0 t -measurable functionsx(·), and the control u 0 (·) of the leader agent A 0 , find the optimal response functionalū i T] of the following differential games among followers: The analysis of Problem (II) can be further decomposed into substeps using MFG theory.
Step 1 (SOC-F): Consider the Nash equilibrium response functional of Problem (II) for the representative follower agent denoted byū i [·, ·]. For given x i0 , F w 0 t -measurable functionsx(·), and the control u 0 (·) of the leader T] of the following Nash differential games among followers: Step 2 (CC-F): Apply the state-aggregation method to determine the state-average limit x by the following consistency condition qualification: By virtue of such steps, the Nash equilibrium response functional of the follower and x =x(u 0 ) can be specified, given any admissible strategy announced by leaders. Given the optimal response of all followers, we can turn to solve the problem of the leader.

Optimal strategy of auxiliary problems
From now on, we might suppress time variable t in case no confusion occurs. As mentioned before, we focus on the auxiliary limiting LQG optimization problems, i.e., Problem (II) first.

Optimal strategy of the follower
The main result of this section can be stated as follows.
satisfies the following stationarity condition: (ii) For i = 1, 2, . . . , N , the following convexity condition holds: Or, equivalently, the mapping be an adapted solution to FBSDE (4.1). For any u i (·) ∈ U i [0, T] and ε ∈ R, let x ε i (·) be the solution to the following perturbed state equation on [0, T]: Then, denoting byx i (·) the solution of (4.4), we have x ε i (·) =x i (·) + εx i (·) and On the other hand, applying Itô's formula tox iȳi and taking expectation, we obtain Hence, It follows that if and only if (4.2) and (4.3) hold.
By assumption R > 0, we can figure out that the optimal response is so the related Hamiltonian system can be represented by x i (0) = x i0 ,ȳ i (T) = Hx i (T), i = 1, 2, . . . , N.
Based on the above analysis, we havē Here, the first equality of (4.6) is due to the consistency condition by which the frozen termx(·) should equal the average limit of all realized statesx i (·); the second equality is due to the law of large numbers on common noise. Thus, by replacingx with E[x i ], we get the following system:  As all agents are statistically identical, we may suppress the subscript "i", and the following consistency condition system arises for a "representative" agent: where W (·) stands for a generic Brownian motion on ( , F, P) that is independent of W 0 .
x is a representative element of {x i0 } 1≤i≤N , and x 0 (·) is a quantity need to be determined by further consistency condition analysis, to be given later.

Optimal strategy of the leader
Once Problem (II) is solved, we turn to finding the optimal control of the leader (agent A 0 ). Note that when the followers take their optimal responseū i (·) given by (4.5), the major leader ends up with the following state equation system: (4.9) And its corresponding cost functional is We present the optimal control problem for the leader as follows.

Problem (III)
When the followers take their optimal responseū i (·) given by (4.5), find the optimal controlū 0 (·) ∈ U 0 [0, T] such that The main result of this section can be stated as follows.

It follows that
if and only if (4.12) and (4.13) hold.
Since R 0 > 0, furthermore, we can compute out the optimal control for the leader agent so we can finally get the consistency condition for the auxiliary problems as follows:

The consistency condition system
By the results in the last section, we can find the optimal response of the followers and the optimal control of the leader if we can show the well-posedness of coupled BFSDE (4.16).
In this section, we turn to verify its well-posedness (refer to [19]) since it is important to the decentralized strategy design. To get the well-posedness of (4.16), we give the following assumption: (H2) B 0 = 0, H 0 > 0,Q > 0. Proof Uniqueness. For the sake of notational convenience, in (4.16) we denote by b(φ), σ (φ) the coefficients of drift and diffusion terms, respectively, for φ =ȳ 0 ,x,k; denote by f (ψ) the generator for ψ =x 0 ,p,ȳ.

Then, for any
In the following, we are first going to show that (4.16) admits at most one adapted solution.
Existence. In order to prove the existence of the solution, we first consider the following family of FBSDEs parameterized by γ ∈ [0, 1]: . Clearly, when γ = 1, the existence of (5.1) implies that of (4.16). When γ = 0, it is easy to obtain that (5.1) admits a unique solution. Actually, the 2-dim FBSDE is very similar to the Hamiltonian system of Lim and Zhou (2001) [16].
Then it follows that, in particular, as γ = 1 corresponding to ϕ i We characterized the decentralized strategies {ū i } 0≤i≤N of Problem (I) through the auxiliary Problem (II) and the consistency condition system. Now, we turn to verify the ε-Nash equilibrium of these decentralized strategies. We first present the definition of ε-Nash equilibrium.
Then we turn to estimate the SDE part of (6.3). By using the BDG inequality, there exists a constant M independent of N such that, for any t ∈ [0, T], and by Gronwall's inequality, we obtain Thus, then we have the following.
Proof In fact, we have Proof Let us first consider the leader agent A 0 . Recalling (2.4) and (4.10), we have J 0 (ū 0 ,ū -0 ) -J 0 (ū 0 ) (6.6) By Hölder's inequality and Lemma 6.3, there exists a constant M independent of N such that Noting (6.6), (6.7) and Lemma 6.4, there exists a constant M independent of N such that The remaining claims of the followers can be proved in the same way.
Remark 6. 6 We denote by M the common constant of different bounds. In the above lemmas, the constant M may vary line by line but it is always independent of the number of follower agents N .

Leader agent's perturbation
In this subsection, we prove that the control strategies set (ū 0 ,ū 1 , . . . ,ū N ) given by Theorem 6.2 is an ε-Nash equilibrium of Problem (I) for the leader agent A 0 , i.e., there exists ε = ε(N) ≥ 0, lim N→∞ ε(N) = 0 such that Let us consider that the leader agent A 0 applies an alternative strategy u 0 and each follower agent A i uses the controlū i (t) = -R -1 (Bȳ i + Dz i (t)). To prove that (ū 0 ,ū 1 , . . . ,ū N ) is an ε-Nash equilibrium for the leader agent, we need to show that for possible alternative control u 0 , inf u 0 ∈U 0 [0,T] J 0 (u 0 ,ū -0 ) ≥ J 0 (ū 0 ,ū -0 )ε. Then we only need to consider the perturbation u 0 ∈ U 0 [0, T] such that J 0 (u 0 ,ū -0 ) ≤ J 0 (ū 0 ,ū -0 ). By the representation of a cost functional in [21,28], we can give the representation of a cost functional as follows.  Remark 6.8 Here, in fact, we have N 0 = R 0 which is assumed to be a positive number. So we clearly have the result of (6.9). If we have to deal with a more complicated cost functional, we may use the representation of the cost functional in [21,28]. But in this paper, we can avoid this tool actually, and we just provide a method in case the problem is not so clear. Proof Recall (2.4) and (4.10), we have J 0 (u 0 ,ū -0 ) -J 0 (u 0 ) (6.15) Taking the advantage of Lemmas 6.5 and 6.11, we can give the second part of the proof to Theorem 6.2, i.e., the control strategies set (ū 0 ,ū 1 , . . . ,ū N ) given by Theorem 6.2 is an ε-Nash equilibrium of Problem (I) for each of the follower agents.
Part B of the proof to Theorem 6.2 Combining Lemmas 6.5 and 6.11, we have where the second inequality comes from the fact that J i (ū i ) = inf u i ∈U i [0,T] J i (u i ). Consequently, Theorem 6.2 holds for each of the follower agents with ε = O( 1 √ N ). Finally, combined with Part A, we complete the proof to Theorem 6.2.
Remark 6.12 So far, we have solved the optimal strategy from the BFSDE, but in this case, we cannot introduce a kind of Riccati equation to decouple the equation. Then we may consider how to apply the results in reality. Fortunately, there are lots of existing methods helping us to do some explicit computation.
In the fields about numerical algorithms and simulations for BSDEs, Peng and Xu [20] studied the convergence results of an explicit scheme based on approximating Brownian motion by random walk, which is efficient in programming, and they developed a software package based on this algorithm for BSDEs. Recently, the authors Sun, Zhao, and Zhou [22] proposed an explicit θ -scheme for MF-BSDEs, and we can get more results about MF-FBSDE simulations and numerical methods from other literature works of them.
Another common method to compute the solution of FBSDEs is computing the related partial differential equations (PDEs), and one of the most famous methods is the four step scheme introduced by Ma, Protter, and Yong [17]. By virtue of the quasilinear parabolic PDE, the adapted solution can always be sought under some conditions. We can refer to Chap. 9 of [28] to get more details about these numerical methods.