Skip to main content

Distributed constrained optimization via continuous-time mirror design


Recently, distributed convex optimization using a multiagent system has received much attention by many researchers. This problem is frequently approached by combing the consensus algorithms in the multiagent literature and the gradient algorithms in the convex optimization literature. Compared with unconstrained distributed optimization, the constrained case is more challenging, and it is usually tackled by the projected gradient method. However, the projected gradient algorithm involves projection nonlinearity and thus is hard to analyze. To avoid gradient projection, in this paper, we present a novel distributed convex optimization algorithm in continuous time by using mirror design. The resulting optimization dynamics is smooth without using gradient projection and is designed in a primal-dual framework, where the primal and dual dynamics are respectively aided by the mirror descent and the mirror ascent. As for the merit of mirror design in our paper, it avoids gradient projection in the optimization dynamics design and removes the difficulty of analyzing projection nonlinearity. Furthermore, the mirror base primal-dual optimization dynamics facilitates more convenience construction of Lyapunov functions in the stability analysis.


Optimization is an important field in mathematics, and many engineering applications can be converted into optimization problems [110]. Recent years have witnessed an increasing attention on distributed convex optimization using multiagent systems [11, 12], which is motivated by the emergence of large-scale networks such as internet networks, wireless sensor networks, and mobile ad hoc networks. Distributed convex optimization refers to minimizing the aggregate sum of N convex cost functions by designing N dynamics, where each dynamics is distributed on one node and has only access to the information of one cost function and the state from its neighboring dynamics. The objective is that all the states of the N dynamics consensually and asymptotically converge to the minimizer of the total objective function. The optimization is solved in a distributed way since each local optimization dynamics, termed as a node on the network, uses information from its neighbors. The distributed optimization problem has been investigated from different perspectives; refer [11, 1315] and references therein for details.

For the distributed optimization problem, there were many useful algorithms reported in the literature, such as distributed primal-dual gradient algorithm [16], nonsmooth-analysis-based algorithm [17], and approximate-projection-based algorithm [18]. Of particular interest among them is the distributed gradient projection method [11, 1922], which requires computing the projection of the gradient. To overcome this difficulty, we propose a novel distributed convex optimization algorithm utilizing the mirror ascent/descent method. The original mirror ascent/descent method was proposed by Nemirovski and Yudin [23] and later evolved into a series of papers [24, 25]. However, most of these algorithms were discrete-time algorithms [20, 2628], with relatively few putting attention on the case of continuous time [29]. As for distributed mirror descent algorithm, the continuous-time case, compared with discrete-time case, is more attractive since it can facilitate the use of the elegant Lyapunov argument [30] to aid the convergence analysis and allow the tool of differential geometry to be used in optimizing constrained problem [31].

Along the line of utilizing continuous-time mirror descent algorithms for distributed convex optimization problem, the work [32] was one of the earliest contributions; however, no proof of convergence was given there. Later, the authors in [33] presented a proof by using tools from nonsmooth analysis and set-valued dynamical systems, but with a limitation of only considering the simple case of unconstrained optimization, leaving the challenging case of constrained optimization untouched. The first contribution of this paper is to tackling the hard problem of constrained optimization under the framework of continuous-time mirror descent.

Generalizion of the above works on continuous-time distributed mirror descent algorithms from the unconstrained case to the constrained one is not a simple task. By reviewing some commonly used constrained optimization algorithms and by pointing out their inherent disadvantages, in this paper, we apply the continuous-time mirror descent to design a novel optimization algorithm, which overcomes these disadvantages. It is well known that for distributed optimization problem, it is not a simple task when constraints in the optimization problem are taken into account. To handle the constraints, a variety of approaches, including the logarithmic barrier method [32], Lagrangian multiplier method [15], projected consensus algorithm [11], penalty-based method [34], and global linearization approach [9], can be resorted to. Among them, the primal-dual method is the most popular. It transforms the constrained optimization problem into an equivalent unconstrained one by designing the primal and dual dynamics. We note that the dual dynamics designed in this way is nonsmooth due to the projection operator involved to keep the evolution of the Lagrangian multipliers within the nonnegative orphan, and therefore it is difficult to analyze. As another contribution of this paper, we pursue a novel line of designing the dual dynamics via the mirror descent method. The merit of our design is avoiding gradient projection, and furthermore the resulting optimization dynamics is smooth.

Aside from modifying the nonsmooth dual dynamics (i.e., the dynamics for the Lagrangian multipliers) in the existing literature into a smooth version by borrowing the idea of mirror descent algorithm, we also redesign the primal dynamics in the existing literature by using the mirror ascend. For our smooth dual dynamics, it is designed in such a way that if the initial value of the dual variable is positive, then the value of the multipliers stays positive all the time with the evolution of the dual dynamics. Such a design has the benefit that the positive system theory can be utilized for convergence analysis. Also, we redesign the primal dynamics in the existing literature in a mirror descent way so that the theory of Bregman divergence and Frenchel coupling can be utilized in the stability analysis. The stability of our resulting primal-dual optimization dynamics is analyzed by constructing a Lyapunov function, which is exactly the Fenchel coupling of the Bregman function; for details, refer the explicit form of our Lyapunov function in the proof of Theorem 3. The construction of a new Lyapunov function and the corresponding stability analysis constitute the last contribution of this paper.

In conclusion, the main novelties of this paper are as follows. Firstly, we obtain a continuous-time and distributed version of mirror descent algorithm, which complements the existing distributed optimization discrete-time algorithms. Secondly, the results in this paper consider the more challenging case of constrained distributed mirror descent algorithms, extending the existing results, which only deal with the unconstrained case. The third superiority of our method in comparison with existing results lies in the fact that it avoids using gradient projection in the optimization algorithm design. Therefore, it removes the difficulty of analyzing the resulting nonsmooth optimization dynamics and makes the simulation easier. Fourthly, the frequently used primal-dual algorithm for the optimization problem in the existing literature is modified in our paper via the mirror descent method, giving rise to new primal and dual dynamics. The modified primal dynamics facilitates the use of Bregman divergence and Frenchel coupling in the stability analysis, and the redesigned dual dynamics for the evolution of positive Lagrange multiplier does not include projection and therefore reduces the complexity of convergence analysis. Also, the construction of a Lyapunov function and the corresponding stability analysis are novel.

The rest of this paper is organized as follows. The problem is formulated in Sect. 3. In Sect. 4, we introduce the distributed mirror descent algorithm and use it in the primal and dual dynamics design and in the corresponding convergence analysis. The simulation results supporting our theoretical results are presented in Sect. 5. In Sect. 6, we summarize the paper.


A. Notation. By \(\mathbb{R}\) and \(\mathbb{R}_{+}\) we denote the sets of real and nonnegative real numbers, respectively; \(\mathbb{R}^{n}\) and \(\mathbb {R}^{n}_{+}\) are the sets of n-dimensional vectors and n-dimensional vectors with nonnegative components, respectively. The norm of a vector \(x\in{\mathbb{R}^{n}}\) is denoted by \(\lVert x \rVert= \sqrt{{\sum_{i=1}^{n}{x_{i}^{2}}}} \). By \(x\prec0\) (\(x\preceq0\)) for a vector x we mean that each entry of x is less than (less than or equal to) zero. For two vectors \(x, y\in\mathbb{R}^{n}\), their inner product is defined as \(\langle x, y\rangle= x^{T}y\). We denote \(\mathbf{1}_{n}=(1,1,\ldots, 1)^{T} \in{\mathbb{R}^{n}}\) and \(\mathbf{0}_{n}=(0, 0,\ldots, 0)^{T} \in{\mathbb{R}^{n}}\). For a set of vectors \(x_{1}, \ldots, x_{N} \in {\mathbb {R}}^{n}\), we denote \(\operatorname{col}\{x_{1},\ldots, x_{N}\}=(x_{1}^{T}, \ldots, x_{N}^{T})^{T}\). For a number \(a \in {\mathbb {R}}\), the projection \(P_{+}[a]\) is defined to be zero if \(a<0\) and a if \(a \geq0\). For a vector x, its projection \(P_{+}[x]\) is defined componentwise. The n-dimensional identity matrix is denoted by \(I_{n} \). For arbitrary matrices A and B, \(A \otimes B\) denotes the Kronecker product of matrices. The eigenvalues of a matrix \(A \in {\mathbb {R}}^{n \times n}\) are denoted by \(\lambda_{i}(A)\) for \(i \in{1,\ldots, n}\). For an integer \(k \geq1\), we denote by \(C^{k}\) the set of k times continuously differentiable functions. For \(f: \mathbb{R}^{n} \rightarrow\mathbb{R}\), \(\nabla f(x)=(\frac {\partial f(x)}{\partial x_{1}}, \ldots, \frac{\partial f(x)}{\partial x_{n}})^{T}\) is its gradient, and for \(L: \mathcal{\mathbb{R}}^{n} \times \mathbb{R}^{m}\rightarrow{\mathbb{R}} \), we denote \(\nabla_{\lambda}L(x,\lambda) = (\frac{\partial L(x, \lambda )}{\partial\lambda_{1}}, \frac{\partial L(x, \lambda)}{\partial \lambda_{2}}, \ldots ,\frac{\partial L(x, \lambda)}{\partial\lambda_{n}})^{T}\). We say that \(f: {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) is a convex function if for any \(x, y \in{\mathbb{R}}^{n}\) and \(0 \leq\lambda\leq1\), it satisfies \(f((1-\lambda)x +\lambda y) \leq(1-\lambda)f(x) + \lambda f(y)\). The convexity of f implies \(\nabla f(x) (y-x) \leq f(y)-f(x)\). Furthermore, when \(x \neq y\), the strict convexity of f is equivalent to \(\nabla f(x) (y-x) < f(y) -f(x)\) or to \(\nabla^{2} f(x) > 0\).

B. Graph theory. Consider a graph \({G}=(\mathcal{V}, \mathcal{E})\), where \(\mathcal{V}=\{1,2,\ldots ,N\}\) is the set of nodes representing N agents, and \(\mathcal{E} \subset\mathcal{V} \times \mathcal{V}\) is the set of edges of the graph. An edge of G is denoted by \((i, j)\), which means that agents i and j can exchange information between them. A graph is undirected if the edges \((i,j)\) and \((j,i)\) in \(\mathcal{V}\) are considered to be the same; otherwise, the graph is directed. The set of neighbors of node i is denoted by \(\mathcal{N}_{i}=\{j\in\mathcal{V}: (j,i)\in\mathcal {E}, j\neq i\}\). The adjacency matrix \(A=[a_{ij}]\) of a graph G on vertex \(\{ 1,\ldots,N\}\) is the \(N\times N\) matrix with off-diagonal elements defined by specifying \(a_{ij}=1\) if \((i,j)\) is an edge of G and \(a_{ij}=0\) otherwise and with diagonal elements defined as \(a_{ii}=-\sum_{j\in \mathcal {N}_{i}}a_{ij}\). The Laplacian matrix \(\mathcal {L} \in{\mathbb{R}^{N\times N}}\) of a graph \({G}=(\mathcal{V}, \mathcal {E})\) is defined as follows: if \(i = j\), then \(\mathcal {L}_{ij} = \sum_{j\in \mathcal {N}_{i}} a_{ij}\), and if \(i \neq j\), then \(\mathcal {L}_{ij} = -a_{ij}\). For any undirected graph, its Laplacian is symmetric positive semidefinite and satisfies \(\mathcal {L} \cdot\mathbf{1}_{N}=0\cdot\mathbf {1}_{N}\). We say that the graph is strongly connected if there is a path between any pair of vertices. Furthermore, the graph G is connected if and only if its Laplacian matrix has a simple zero eigenvalue.

Problem formulation

Consider a network described by a graph \({G}=\{\mathcal{V}, \mathcal {E}\}\), where \(\mathcal{V}= \{1,2, \ldots,N \}\) represents the set of N nodes, and \(\mathcal{E}\subset\mathcal{V}\times\mathcal{V}\) denotes the set of edges of the graph. For each node \(i \in\mathcal {V}\), there are a convex cost function \(f_{i}: \mathcal{\mathbb{R}}^{n} \rightarrow{\mathbb{R}}\), a set of inequality constraints \(g_{ij}\leq 0\), \(j=1, \ldots, r_{i}\), and a set of equality constraints \(h_{ij}=0\), \(j=1, \ldots, s_{i}\), where \(r_{i}\), \(s_{i}\) are positive integers, and \(g_{ij}: \mathbb{R}^{n} \rightarrow\mathbb{R}\), \(j=1, \ldots, r_{i}\), and \(h_{ij}: \mathbb{R}^{n} \rightarrow\mathbb{R}\), \(j=1, \ldots, s_{i}\), are all convex functions. If there are no constraints for agent i, we set \(g_{ij}(x)\equiv0\), \(j=1, \ldots, s_{i}\), and \(h_{ij}\equiv0\), \(j=1, \ldots, r_{i}\). The global network cost function \(f: \mathcal{\mathbb {R}}^{n} \rightarrow{\mathbb{R}}\) is defined as \(f(x)= \sum_{i=1}^{N} f_{i}(x)\). In this paper, we consider the following optimization problem:

$$ \textstyle\begin{cases} \mbox{minimize}& f(x)=\sum_{i=1}^{N}f_{i}(x), \\ \mbox{subject to}& g_{i}(x)\preceq0,\quad i=1, \ldots, N, \\ &h_{i}(x)=0,\quad i=1, \ldots, N, \end{cases} $$

where \(g_{i}=(g_{i1}, \ldots, g_{ir_{i}})^{T}\) and \(h_{i}=(h_{i1}, \ldots, h_{is_{i}})^{T}\). Obviously, \(g_{i}: \mathbb{R}^{n} \rightarrow\mathbb{R}^{r}\) and \(h_{i}: \mathbb{R}^{n} \rightarrow\mathbb{R}^{s}\), where \(r=r_{1}+\cdots +r_{N}\) and \(s=s_{1}+\cdots+s_{N}\). The optimization problem is to find \(x^{*}\in{\mathbb{R}}^{n}\) such that the objective function \(f(x)\) is minimized and the constraints \(g_{i}(x^{*}) \preceq0\) and \(h_{i}(x^{*}) = 0\) are satisfied. Such \(x^{*}\) is called the optimal solution, and the corresponding value \(f^{*}= f(x^{*})\) is called the optimal value.

This problem is usually solved by introducing a Lagrangian function and its saddle point, which are defined as follows.

Definition 1

(Lagrangian function [35])

The Lagrangian function associated with problem (3.1) is defined as a mapping \(L: {\mathbb{R}}^{n} \times{\mathbb{R}}^{r}_{+} \times{\mathbb {R}}^{s}\rightarrow{\mathbb{R}}\) specified by

$$ L(x,\lambda, \nu)=\sum_{i=1}^{N}f_{i}(x)+ \sum_{i=1}^{N}\sum _{j=1}^{r_{i}}\lambda_{ij}g_{ij}(x)+ \sum_{i=1}^{N}\sum _{j=1}^{s_{i}}\nu _{ij}h_{ij}(x), $$

where \(\lambda=\operatorname{col}\{\lambda_{1}, \ldots, \lambda_{N}\}\) with \(\lambda_{i} =\operatorname{col}\{\lambda_{i1}, \ldots, \lambda_{ir_{i}}\}\in {\mathbb{R}}^{r_{i}}_{+}\), and \(\nu=\operatorname{col}\{\nu_{1}, \ldots, \nu_{N}\}\) with \(\nu_{i} =\operatorname{col}\{\nu_{i1}, \ldots, \nu_{is_{i}}\}\in{\mathbb {R}}^{s_{i}}\). Obviously, \(\lambda\in\mathbb{R}^{r}_{+}\) and \(\nu\in \mathbb{R}^{s}\).

Definition 2

(Saddle point [36])

A couple \((x^{*}, (\lambda^{*}, \nu^{*}))\in{\mathbb{R}}^{n} \times ({\mathbb{R}}^{r}_{+} \times{\mathbb{R}}^{s})\) is a saddle point of the Lagrangian function L if it satisfies

$$ L\bigl(x^{*},(\lambda, \nu)\bigr)\leq L\bigl(x^{*},\bigl(\lambda^{*}, \nu^{*}\bigr) \bigr)\leq L\bigl(x,\bigl(\lambda^{*}, \nu^{*}\bigr)\bigr). $$

To the Lagrangian function \(L(x,\lambda, \nu)\), there corresponds the Lagrange dual function \(\Omega:{\mathbb{R}}^{r}_{+}\times{\mathbb{R}}^{s} \rightarrow{\mathbb{R}}\) defined as

$$ \Omega(\lambda,\nu)=\inf_{x\in{\mathbb{R}}^{n}} L(x, \lambda, \nu). $$

Obviously, \(\Omega(\lambda, \nu)=\inf_{x\in{\mathbb{R}}^{n}} L (x, \lambda, \nu)\leq L(x^{*}, \lambda, \nu)\leq f(x^{*})= f^{*}\). So the dual function provides a lower bound for the optimal value. We hope that the best lower bound

$$ \rho^{*}=\sup_{\lambda\succcurlyeq0, v\in{\mathbb{R}}^{s}} \Omega(\lambda, \nu). $$

The couple of values \((\lambda^{*}, \nu^{*})\) satisfying \(\Omega(\lambda ^{*}, \nu^{*}) = \rho^{*}\) is called the dual optimal solution, whereas \(x^{*}\) achieving \(f(x^{*})= f^{*}\) is called the primal optimal solution. The case \(f^{*}=\rho^{*}\) can be guaranteed by imposing, for example, the Slater condition as follows.

Definition 3

(Slater’s constraint qualification certificate [36])

The Slater constraint qualification certificate is satisfied by (3.1) if there exists \(x\in\mathbb{R}^{n}\) such that \(g_{i}(x) \prec0\) and \(h_{i}(x)=0\) for \(i=1, \ldots, N\).

It can be shown that the saddle point \((x^{*}, (\lambda^{*}, \nu^{*}))\) of the Lagrangian \(L(x,\lambda, \nu)\) associated with problem (3.1) provides an optimal solution \(x^{*}\) to the optimization problem (3.1), but conversely, the primal optimal solution \(x^{*}\) together with the dual optimal solution \((\lambda^{*}, \nu^{*})\) does not provide a saddle point \((x^{*}, (\lambda^{*}, \nu^{*}))\) for \(L(x,\lambda, \nu)\). To achieve this, the following theorem, which can be found in [36, 37], is useful.

Theorem 1

If the pair \((x^{*}, (\lambda^{*}, \nu^{*}))\) is a saddle point for \(L(x,\lambda, \nu)\), then \(x^{*}\) is an optimal solution to problem (3.1). Conversely, if \(x^{*}\) is an optimal solution to problem (3.1), then there exists a couple of points \((\lambda^{*}, \nu^{*}) \in\mathbb{R}^{r}_{+} \times\mathbb{R}^{s}\) such that \((x^{*}, (\lambda^{*}, \nu^{*}))\) is a saddle point for \(L(x,\lambda, \nu)\).

Remark 1

According to this theorem, finding an optimal solution \(x^{*}\) of problem (3.1) transforms to seeking a saddle point \((x^{*}, (\lambda^{*}, \nu^{*}))\) of the Lagrangian \(L(x,\lambda, \nu)\) in (3.2). The latter amounts to minimizing the Lagrangian with respect to x and maximizing the Lagrangian with respect to \((\lambda , \nu)\).

Another useful concept characterizing the optimal solution to problem (3.1) is the KKT conditions stated in the following theorem [36].

Theorem 2

Suppose that the Slater conditions in Definition 3 are satisfied and \(f_{i}\), \(g_{i}\), \(h_{i}\), \(i = 1, \ldots, N\), are convex. Then \(x^{*}\) is a solution of (3.1) if and only if there exists \((\lambda^{*}, \nu^{*}) \in\mathbb{R}^{r}_{+} \times\mathbb{R}^{s}\) such that the following conditions (called the KKT conditions) hold:

$$ \mathrm{KKT}\mbox{:}\quad \textstyle\begin{cases} g_{ij}(x^{*}) \leq0,\quad j=1, \ldots, r_{i}, \\ h_{ij}(x^{*}) = 0,\quad j=1, \ldots, s_{i}, \\ \lambda_{ij}^{*}\geq0,\quad j=1, \ldots, r_{i}, \\ \lambda_{ij}^{*}g_{ij}(x^{*})=0,\quad j=1, \ldots, r_{i}, \\ \sum_{i=1}^{N} \nabla f_{i}(x^{*})+\sum_{i=1}^{N}\sum_{j=1}^{r_{i}}\lambda _{ij}^{*} \nabla g_{ij}(x^{*})+\sum_{i=1}^{N}\sum_{j=1}^{s_{i}}\nu _{ij}^{*}\nabla h_{ij}(x^{*})=0. \end{cases} $$

The point \((x^{*}, \lambda^{*}, \nu^{*})\) obtained in Theorem 2 is called the KKT point. Motivated by this theorem, we will tackle the constrained optimization problem (3.1) by adopting the dynamical system approach. We formulate the problem in detail as follows.

Problem formulation: In what follows, we will design in a distributed way a continuous-time dynamics for \((x, \lambda, \nu)\) such that:

  • the dynamics is smooth by avoiding projection;

  • the equilibrium of this dynamics is exactly the saddle point or the KKT point;

  • the equilibrium of this dynamics is asymptotically stable;

  • the λ-subdynamics remains nonnegative all the time for a nonnegative initial condition.

Remark 2

In the literature, the gradient method is used to design the primal dynamics \(\dot{x} = -\nabla_{x} L(x, \lambda, \nu)\), and the projected gradient method is applied to design the dual dynamics \(\dot {\lambda}= P_{+}[\nabla_{\lambda}L(x, \lambda, \nu)]\), \(\dot{\nu}= \nabla_{\nu}L(x, \lambda, \nu)\). The λ-dynamics is obviously nonsmooth. To overcome the difficulty of the nonsmoothness in the dual dynamics, this paper proposes a mirror descent method, rather than the projected gradient method, to design a smooth λ-dynamics. This paper also extends the gradient x-dynamics to a mirror descent setup and the distributed framework.

Distributed mirror descent algorithm for constrained optimization

According to Theorems 1 and 2, the constrained optimization problem (3.1) can be transformed to solving the unconstrained optimization problem of minimizing the Lagrangian \(L(x,\lambda, \nu)\) with respect to x and maximizing \(L(x,\lambda, \nu)\) with respect to \((\lambda , \nu)\). Let us review the traditional way to tackle these minimization and maximization problems:

  • Minimization of \(L(x,\lambda, \nu)\) with respect to x can be realized by designing a dynamics following the gradient descent as \(\dot{x} = -\nabla_{x} L(x,\lambda, \nu)\), extended in a distributed way by including a consensus term to the following form:

    $$ \dot{x}_{i}=-\nabla f_{i}(x_{i})- \sum_{j}^{r_{i}}\lambda_{ij}\nabla g_{ij}(x_{i})-\sum_{j}^{s_{i}} \nu_{ij}\nabla h_{ij}(x_{i})+\sum _{j\in \mathcal{N}_{i}}(x_{j}-x_{i}). $$
  • Likewise, maximization of \(L(x,\lambda, \nu)\) with respect to \((\lambda , \nu)\) can be achieved by resorting the projected gradient ascent method \(\dot{\lambda}= P_{+}[\nabla_{\lambda}L(x, \lambda, \nu)]\), \(\dot{\nu}= \nabla_{\nu}L(x, \lambda, \nu)\), or more specifically,

    $$\begin{aligned}& \dot{\lambda}_{ij}= P_{+}\bigl[g_{ij}(x_{i})\bigr], \quad j=1, \ldots, r_{i}, \end{aligned}$$
    $$\begin{aligned}& \dot{\nu}_{ij}=h_{ij}(x_{i}),\quad j=1, \ldots, s_{i}, \end{aligned}$$

    where the projection \(P_{+}\) is used to keep positive \(\lambda _{ij}(t)\) all the time.

In this section, we will borrow the mirror method to redesign the projected λ-dynamics in (4.2) such that the redesigned λ-dynamics is smooth and positive invariant with respect to \(\Lambda = \{\lambda\mid \lambda_{ij}\geq0, j= 1, \ldots r_{i}, i= 1, \ldots, N\}\). More specifically, Sect. 4.1 is devoted to the general theory of continuous-time mirror descent, which is used to redesign the dual λ-dynamics in Sect. 4.2 and x-dynamics in Sect. 4.3.

General theory on mirror descent

The mirror descent algorithm is devoted to the constrained minimization problem

$$ \min_{x \in\mathcal {X}}\digamma(x), $$

where \(\mathcal {X}\) is a convex compact set in \(\mathbb {R}^{n}\). To explain this algorithm, the following definitions are reviewed for later use.

Definition 4

(Distance-generating function [29])

A function \(\phi: \mathcal {X}\rightarrow \mathbb {R}\) is called a distant-generating function modulus \(\alpha >0\) with respect to \(\|\cdot\|\) if ϕ is convex and continuous on \(\mathcal {X}\), the set \(\mathcal {X}^{o}=\{x\in\mathcal {X}\mid \nabla\phi(x)\neq\emptyset\}\) is convex (note that \(\mathcal {X}^{o}\) contains the relative interior of \(\mathcal {X}\)) and ϕ restricted to \(\mathcal {X}^{o}\) is continuously differentiable and strongly convex with parameter α with respect to \(\|\cdot\|\) in the sense that \((y-x)^{T}(\nabla\phi(y)-\nabla\phi(x))\geq \alpha \|y-x\|^{2}\) for all \(x,y\in\mathcal {X}^{o}\).

Definition 5

(Bregman divergence [38])

A function \(B_{\phi}: \mathcal {X}^{o} \times\mathcal {X} \rightarrow {\mathbb {R}}_{+}\) defined by

$$ B_{\phi}(x,y)=\phi(y)-\phi(x)-\nabla\phi(x)^{T}(y-x) $$

is called the Bregman divergence (or prox-function) associated with ϕ.

In what follows, we use the Bregman divergence associated with \(\phi (x)= \sum_{i=1}^{N}x_{i}{\log(x_{i})}\); in this case, we easily calculate

$$ B_{\phi}\bigl(x,x^{\prime}\bigr)= \sum _{i=1}^{N}x_{i}\log\frac{x_{i}}{x^{\prime}_{i}} + \sum_{i=1}^{N}\bigl(x^{\prime}_{i} - x_{i}\bigr). $$

The definition of conjugate and its properties is very important in the analysis of mirror image descent.

Remark 3

Let \(\mathcal {Z} = \{{z\in\mathbb{R}^{n}\mid z=\nabla\phi (x), x\in \mathcal {X}}\}\). We can define the so-called Fenchel coupling \(F(x^{*}, z) = \phi(x^{*}) + \phi^{*}(z) - \langle z, x^{*}\rangle\) for \(x^{*}\in \mathcal {X}\) and \(z \in{ \mathcal {Z}}\), which is nonnegative and strictly convex in both arguments.

Definition 6

(Legendre–Fenchel conjugate [39])

For a distance-generating function \(\phi: \mathcal {X} \rightarrow{\mathbb {R}}\), its Legendre–Fenchel conjugate convex function \(\phi^{*}\) is defined as \(\phi^{*}(\omega)=\sup_{x\in\mathcal {X}}\{(x, \omega)-\phi (x)\}\), which can be shown to be strictly convex and twice differentiable.

In our notation, the general continuous-time mirror ascent algorithm for the constrained optimization problem \(\max_{x \in\mathcal {X}}\digamma(x)\) takes the form

$$\begin{aligned}& \dot{z} =\nabla\digamma(x), \end{aligned}$$
$$\begin{aligned}& x =\nabla\phi^{*}(z). \end{aligned}$$

Remark 4

For later use, let us give some comments regarding the Legendre–Fenchel conjugate and its properties:

  • Similarity: we can define \(\phi^{**}: \mathcal {X} \rightarrow \mathbb {R}\) as \(\phi^{**}(x)=\sup_{\omega\in\mathcal {X}}\{(\omega, x)-\phi ^{*}(\omega)\}\). Under the condition that ϕ is strictly convex and twice differentiable, we have \(\phi^{**}=\phi\).

  • The gradients of ϕ and \(\phi^{*}\) are inverse to each other. This can be seen as follows. For any fixed \(\omega ^{*} \in\mathcal {X}\), evaluating \(\phi^{*}(\omega)=\sup_{x\in\mathcal {X}}\{(x, \omega)-\phi(x)\}\) at \(\omega ^{*}\), we obtain \(\phi^{*}(\omega^{*})=\sup_{x\in\mathcal {X}}\{(x, \omega^{*})-\phi(x)\} \). Denoting the maximum by \(x^{*}\), we have \((x^{*}, \omega ^{*})=\phi(x^{*})+\phi ^{*}(\omega ^{*})\) or equivalently \((x^{*}, \omega ^{*})=\phi^{**}(x^{*})+\phi^{*}(\omega ^{*})\). It then follows that the supremum in \(\phi(x^{*})=\phi^{**}(x^{*})=\sup_{\omega\in\mathcal {X}}\{(\omega, x^{*})-\phi^{*}(\omega)\}\) is achieved at \(\omega ^{*}\). Since \(x^{*}\) and \(\omega ^{*}\) are respectively the maxima of \(\phi^{*}(\omega ^{*})=\sup_{x\in\mathcal {X}}\{(x, \omega^{*})-\phi(x)\}\) and \(\phi (x^{*})=\sup_{\omega\in\mathcal {X}}\{(\omega, x^{*})-\phi^{*}(\omega)\} \), by the Fermat theorem we have \(\omega *=\nabla\phi(x^{*})\) and \(x^{*}=\nabla\phi^{*}(\omega ^{*})\). Therefore, \(\omega ^{*}=\nabla\phi(\nabla\phi^{*}(\omega ^{*}))\). By the arbitrariness of \(\omega ^{*}\) it follows that \((\nabla\phi)^{-1}=\nabla\phi^{*}\) and \((\nabla\phi^{*})^{-1}=\nabla\phi\).

Mirror design for dual λ-dynamics

Recall that the optimization problem (3.1) incorporates the subproblem of maximization of \(L(x,\lambda,\nu)\) with respect to \((\lambda , \nu)\). Since there is no constraint on the multiplier ν, minimization of \(L(x,\lambda,\nu)\) with respect to ν can be realized by running the dynamics \(\dot{\nu}(t)=-\nabla_{\nu}L(x(t), \lambda (t), \nu(t))\). However, the method does not apply to the minimization of \(L(x,\lambda,\nu)\) with respect to λ since λ is required to stay positive. To tackle this constrained optimization problem \(\max_{\lambda \in \Lambda } L(x, \lambda , \nu)\), we use the mirror descent framework developed in the last subsection. According to Remark 4, equation (4.6) is equivalent to \(z=\nabla\phi(x)\). Taking the derivative on both sides yields \(\dot{z}=\nabla^{2} \phi(x) \dot{x}\), which, together with (4.5), leads to \(\dot{x}=[\nabla ^{2} \phi(x)]^{-1}\nabla\digamma(x)\).

Now, by replacing \(\digamma(x)\) and \(\mathcal {X}\) in (4.4) respectively with \(\digamma(x)= L(x, \lambda , \nu)\) and \(\mathcal {X}=\Lambda \) the rephrased mirror ascent algorithm in this case becomes \(\dot{\lambda}=[\nabla^{2} \phi(\lambda )]^{-1}\nabla _{\lambda } L(x, \lambda , \nu)\). Noting that the constraint set is Λ, we chose ϕ to be \(\phi (\lambda )= \frac{\alpha}{2} {\parallel{\lambda}\parallel}^{2} + \beta \sum_{i=1}^{N} \sum_{j=1}^{r_{i}} \lambda_{ij}\ln\lambda_{ij}\), which is well defined on \({\mathbb{R}}^{r}_{+}\), where α and β are arbitrary real numbers. We can check that ϕ is a surjective mapping from Λ to \({\mathbb {R}}^{r}\). Now the \(\nabla^{2} \phi(\lambda )\) can be easily calculated as

$$ \nabla^{2}\phi(\lambda) = \left ( \textstyle\begin{array}{@{}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{}} \alpha+\frac{\beta}{\lambda_{11}}& & & &&&&&\\ &\ddots& & & &&&&\\ &\alpha+\frac{\beta}{\lambda_{1r_{1}}}& & &&&&&\\ & & \alpha+\frac{\beta}{\lambda_{21}} & &&&&&\\ & & & \ddots&&&&&\\ & & & & \alpha+\frac{\beta}{\lambda_{2r_{2}}}&&&&\\ & & & & &\ddots&&&\\ & & & & & &\alpha+\frac{\beta}{\lambda_{N1}}&&\\ & & & & & & &\ddots&\\ & & & & & & &&\alpha+\frac{\beta}{\lambda_{Nr_{N}}} \end{array}\displaystyle \right ), $$

and, consequently,

$$ \bigl[\nabla^{2}\phi(\lambda)\bigr]^{-1} = \left ( \textstyle\begin{array}{@{}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{}} \frac{\lambda_{11}}{\beta+\alpha\lambda_{11}}& & & &&&&&\\ &\ddots& & & &&&&\\ &\frac{\lambda_{1r_{1}}}{\beta+\alpha\lambda_{1r_{1}}}& & &&&&&\\ & & \frac{\lambda_{21}}{\beta+\alpha\lambda_{21}} && &&&&\\ & & & \ddots&&&&&\\ & & & & \frac{\lambda_{2r_{2}}}{\beta+\alpha\lambda_{2r_{2}}}&&&&\\ & & & & &\ddots&&&\\ & & & & & &\frac{\lambda_{N1}}{\beta+\alpha\lambda_{N1}}&&\\ & & & & & && \ddots&\\ & & & & & & &&\frac{\lambda_{Nr_{N}}}{\beta+\alpha\lambda_{Nr_{N}}} \end{array}\displaystyle \right ). $$

Therefore, the mirror ascent algorithm \(\dot{\lambda}=[\nabla^{2} \phi (\lambda )]^{-1}\nabla_{\lambda } L(x, \lambda , \nu)\) can be explicitly represented as

$$ \dot{\lambda}_{ij}(t)=\frac{\lambda_{ij}(t)}{\beta+\alpha\lambda _{ij}(t)} g_{ij} \bigl(x_{i}(t)\bigr),\quad j=1, \ldots, r_{i}, i=1, \ldots, N. $$

It can be shown by positive system theory that \(\lambda _{ij}(t)\) remains nonnegative for all \(t\geq0\) if \(\lambda _{ij}(0)\geq0\).

Distributed mirror descent design for primal x-dynamics

Let ϕ be a distant-generating function and define \(\mathcal {Z}=\{ z\in \mathbb {R}^{n}\mid z=\nabla\phi(x), x\in\mathcal {X}\}\) to be the image of \(\mathcal {X}\) under the mapping ϕ. Then \(\nabla\phi: \mathcal {X}\rightarrow \mathcal {Z}\). It also follows from Remark (4) that \(\nabla\phi^{*}: \mathcal {Z}\rightarrow \mathcal {X}\). With these preparations, the x-dynamics (4.1) defined in the state space \(\mathcal {X}=\mathbb{R}^{n}\) can be extended to the mirror descent dynamics \((x,z)\) defined in the extended sate space \(\mathcal {X} \times\mathcal {Z}\) as follows:

$$ \begin{aligned} &\dot{z}_{i}=-\nabla f_{i}(x_{i})- \sum_{j}^{r_{i}}\lambda_{ij}\nabla g_{ij}(x_{i})-\sum_{j}^{s_{i}} \nu_{ij}\nabla h_{ij}(x_{i})+\sum _{j\in \mathcal{N}_{i}}(x_{j}-x_{i}), \\ &x_{i} =\nabla\phi^{*}(z_{i}). \end{aligned} $$

In conclusion, for the optimization problem (3.1), we obtain the following mirror algorithm:

$$\begin{aligned} &\dot{z}_{i}=-\nabla f_{i}(x_{i})-\sum _{j}^{r_{i}}\lambda_{ij}\nabla g_{ij}(x_{i})-\sum_{j}^{s_{i}} \nu_{ij}\nabla h_{ij}(x_{i})+\sum _{j\in \mathcal{N}_{i}}(x_{j}-x_{i}), \end{aligned}$$
$$\begin{aligned} &x_{i}=\nabla\phi^{*}(z_{i}), \end{aligned}$$
$$\begin{aligned} &\dot{\lambda}_{ij}=\frac{\lambda_{ij}}{\beta+ \alpha\lambda_{ij}} g_{ij}(x_{i}), \quad j=1, \ldots, r_{i}, \end{aligned}$$
$$\begin{aligned} &\dot{\nu}_{ij}=h_{ij}(x_{i}),\quad j=1, \ldots, s_{i}. \end{aligned}$$

Let \(x^{*}\) be the optimal solution of the optimization problem (3.1), and let \(\lambda _{ij}^{*}\) and \(\nu_{ij}^{*}\) be defined as in Theorem 2. Define \(X=(x_{1}, \ldots, x_{N})^{T}\) and \(X^{*}=(x^{*}, \ldots, x^{*})\). Then by Theorem 2 we can see that \((X^{*}, \lambda ^{*}, \nu^{*})\) is the equilibrium of the dynamical system (4.8a)–(4.8d). Therefore, if we can prove the asymptotic stability of the equilibrium \((X^{*}, \lambda ^{*}, \nu^{*})\) of the dynamical system (4.8a)–(4.8d), then \(X(t) \stackrel{t\rightarrow \infty}{\longrightarrow} X^{*}\), which implies that the states of all agents can estimate the optimal solution \(x^{*}\) consensually and asymptotically. The following theorem presents a convergence analysis for system (4.8a)–(4.8d).

Theorem 3

For the constrained optimization problem (3.1), let the Slater’constraint qualification certificate in Definition 3 be satisfied. Suppose there are N agents whose dynamics are given by (4.8a)(4.8d) and connected by a fixed network. Then for any initial condition with \(\lambda_{ij}(0)\geq0\), we have \(\lambda_{ij}(t)\geq0\) and \(\lim_{t\rightarrow\infty}\|x_{i}(t)-x^{*}\|=0\).


We use the Lyapunov method to prove the stability. To this end, construct a Lyapunov candidate \(V(X, \lambda, \nu) = V_{1} + V_{2} + V_{3} + V_{4} \) as

$$\begin{aligned} & V_{1}= \sum_{i=1}^{N}\bigl[ \phi\bigl(x^{*}\bigr) + \phi^{*}(z_{i})- \bigl\langle {z_{i}}, x^{*}\bigr) \bigr],\quad z_{i}=\nabla\phi(x_{i}), \\ & V_{2}=\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{r_{i}}\alpha\bigl(\lambda _{ij}-\lambda_{ij}^{*}\bigr)^{2}, \\ & V_{3}= \sum_{i=1}^{N} \sum _{j=1}^{r_{i}}\beta\biggl[\bigl( \lambda_{ij}-\lambda _{ij}^{*}\bigr)-\sum _{i,j\in\bigcup}\lambda_{ij}^{*}\bigl(\ln\lambda_{ij}- \ln \lambda_{ij}^{*}\bigr)\biggr], \\ & V_{4}=\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{s_{i}}\bigl(\nu_{ij}- \nu_{ij}^{*}\bigr)^{2}, \end{aligned}$$

where \(\bigcup=\{i,j\mid \lambda_{ij}^{*} \neq0\}\). According to the third equation in (4.8a)–(4.8d), we see that \(\lambda_{ij}(t)\geq0\) if \(\lambda_{ij}(0)>0\); this equation also implies \(\dot{\lambda}_{ij}\leq0\), which means that \(\lambda_{ij}(t)\) is a decreasing function of t, and consequently \(\lambda_{ij}(t)\geq \lambda_{ij}^{*}>0\) for \(i,j \in\bigcup\). Thus, \(V_{3}\) is well defined. Further, a simple calculation shows that \(V_{3}\geq0\). (In fact, \(V_{3}\) is the Bregman divergence induced by \(\phi(x)=\sum_{i} x_{i} \ln x_{i}\) with respect to norm \(\|\cdot\|_{1}\).) Therefore, \(V\geq0\) and \(V(X, \lambda, \nu)=0\) if and only if \((X, \lambda, \nu)=(X^{*}, \lambda ^{*}, \nu^{*})\).

We now calculate the time derivative of the Lyapunov function V along the trajectories of system (4.8a)–(4.8d). For \(V_{1}\), a straightforward calculation shows that

$$\begin{aligned} \dot{V_{1}}&=\sum_{i=1}^{N} \bigl[ \nabla\phi^{*}(z_{i})\dot{z_{i}} - \bigl\langle \dot{z_{i}} , x^{*} \bigr\rangle \bigr] \\ &= \sum_{i=1}^{N} \bigl[ \bigl\langle \dot{z_{i}} , \nabla\phi^{*}(z_{i}) \bigr\rangle - \bigl\langle \dot{z_{i}} , x^{*} \bigr\rangle \bigr] \\ &= \sum_{i=1}^{N} \bigl[ \bigl\langle \dot{z_{i}} , x_{i} - x^{*} \bigr\rangle \bigr] \\ &= \sum_{i=1}^{N} \Biggl[ \Biggl\langle {x_{i} - x^{*}}, -\nabla f_{i}(x_{i})+ \sum _{j \in \mathcal{N}_{i}}(x_{j}-x_{i}) - \sum _{j=1}^{r_{i}}\lambda_{ij}\nabla g_{ij}(x_{i}) - \sum_{j=1}^{s_{i}} \nu_{ij}\nabla h_{ij}(x_{i}) \Biggr\rangle \Biggr]. \end{aligned}$$

Since \((y-x)^{T}\nabla f(x)\leq f(y)-f(x)\) and \(\mathcal {L} X^{*}=0\), we get

$$\begin{aligned} \dot{V_{1}}& = \sum_{i=1}^{N} \bigl(x_{i}-x^{*} \bigr)^{T} \bigl(-\nabla f_{i}(x_{i}) \bigr) + \sum_{i=1}^{N} \biggl\langle {x_{i}-x^{*}}, \sum_{j\in\mathcal{N}_{i}}(x_{j}-x_{i}) \biggr\rangle \\ &\quad {} -\sum_{i=1}^{N}\sum _{j=1}^{r_{i}}\lambda _{ij} \bigl(x_{i}-x^{*}\bigr)^{T} \nabla g_{ij}(x_{i}) -\sum_{i=1}^{N}\sum _{j=1}^{s_{i}}\nu_{ij}\bigl(x_{i}-x^{*} \bigr)^{T} \nabla h_{ij}(x_{i}) \\ &\leq -\bigl(X-X^{*}\bigr)^{T} \mathcal {L} \bigl(X-X^{*}\bigr)+\sum _{i=1}^{N}f_{i}\bigl(x^{*}\bigr) -\sum _{i=1}^{N}f_{i}(x_{i}) \\ & \quad {}+\sum_{i=1}^{N}\sum _{j=1}^{r_{i}}\lambda _{ij}g_{ij} \bigl(x^{*}\bigr)-\sum_{i=1}^{N}\sum _{j=1}^{r_{i}}\lambda _{ij}g_{ij}(x_{i})+ \sum_{i=1}^{N}\sum _{j=1}^{s_{i}}\nu _{ij}h_{ij} \bigl(x^{*}\bigr)-\sum_{i=1}^{N}\sum _{j=1}^{s_{i}}\nu_{ij}h_{ij}(x_{i}). \end{aligned}$$


$$\begin{aligned}& \begin{aligned} \dot{V_{2}} + \dot{V_{3}} &=\sum _{i=1}^{N}\sum_{j=1}^{r_{i}} \frac{\alpha\lambda_{ij}(\lambda _{ij} - \lambda_{ij}^{*})}{\beta+ \alpha\lambda_{ij}} g_{ij}(x_{i}) +\sum _{i=1}^{N}\sum_{j=1}^{r_{i}} \frac{\beta\lambda_{ij}}{\beta+ \alpha\lambda_{ij}} g_{ij}(x_{i})-\sum _{i,j\in\bigcup} \frac{\beta \lambda_{ij}^{*}}{\beta+ \alpha\lambda_{ij}} g_{ij}(x_{i}) \\ & =\sum_{i=1}^{N}\sum _{j=1}^{r_{i}} \bigl(\lambda_{ij}-\lambda _{ij}^{*}\bigr)g_{ij}(x_{i}), \end{aligned} \\& \dot{V_{4}}=\sum_{i=1}^{N}\sum _{j=1}^{s_{i}} \bigl(v_{ij}-v_{ij}^{*} \bigr) h_{ij}(x_{i}). \end{aligned}$$

Combining these calculations together yields

$$\begin{aligned} \dot{V}&\leq\sum_{i=1}^{N} f_{i} \bigl(x^{*}\bigr) + \sum_{i=1}^{N} \sum _{j=1}^{r_{i}}\lambda_{ij}g_{ij} \bigl(x^{*}\bigr) + \sum_{i=1}^{N} \sum _{j=1}^{s_{i}}\nu_{ij}h_{ij} \bigl(x^{*}\bigr) \\ &\quad {}-\sum_{i=1}^{N} f_{i}(x_{i}) - \sum_{i=1}^{N} \sum_{j=1}^{r_{i}}\lambda _{ij}^{*}g_{ij}(x_{i}) - \sum_{i=1}^{N} \sum _{j=1}^{s_{i}}\nu _{ij}^{*}h_{ij}(x_{i}) \\ &\quad {} - \bigl(X-X^{*}\bigr)^{T}\mathcal {L}\bigl(X-X^{*}\bigr). \end{aligned}$$

According to definition (1), we now define another Lagrangian \(\bar{L}: \mathbb{R}^{nN} \times\mathbb{R}^{r}_{+} \times \mathbb{R}^{s}\rightarrow\mathbb{R}\) as follows:

$$ \bar{L}(X, \lambda, \nu)=\sum_{i=1}^{N}f_{i}(x_{i})+ \sum_{i=1}^{N}\sum _{j=1}^{r_{i}}\lambda_{ij}g_{ij}(x_{i})+ \sum_{i=1}^{N}\sum _{j=1}^{s_{i}}\nu_{ij}h_{ij}(x_{i}). $$

Then we get

$$\begin{aligned} \dot{V} & \leq\bar{L}\bigl(X^{*}, \lambda, \nu\bigr)-\bar{L}\bigl(X, \lambda^{*}, \nu ^{*}\bigr)-\bigl(X-X^{*}\bigr)^{T}\mathcal {L}\bigl(X-X^{*}\bigr) \\ & \leq0. \end{aligned}$$

We finally prove the asymptotic stability of the equilibrium. To this end, letting \(\dot{V}=0\) yields

$$\begin{aligned} &\bar{L}\bigl(X^{*}, \lambda, \nu\bigr)-\bar{L}\bigl(X, \lambda^{*}, \nu^{*}\bigr)=0, \end{aligned}$$
$$\begin{aligned} &\bigl(X-X^{*}\bigr)^{T}\mathcal {L}\bigl(X-X^{*}\bigr)=0. \end{aligned}$$

Since the graph is connected, it then follows from (4.11) that \(X=X^{*}\). Also noting that \((X^{*}, (\lambda^{*}, \nu^{*}))\) is a saddle point of the Lagrangian , we have \(\bar{L}(X^{*}, \lambda, v)=\bar{L}(X, \lambda^{*}, v^{*})=\bar{L}(X^{*}, \lambda^{*}, v^{*})\). This yields \(\sum_{i=1}^{N}\sum_{j=1}^{r_{i}}(\lambda_{ij}-\lambda _{ij}^{*})g_{ij}(x^{*})=0\). Since \(g_{ij}(x^{*})\leq0\) and \(\lambda_{ij}\geq\lambda_{ij}^{*}\), it follows that\((\lambda_{ij}-\lambda_{ij}^{*})g_{ij}(x^{*})=0\). The KKT condition \(\lambda_{ij}^{*}g_{ij}(x^{*})=0\) implies that, for all \(\lambda_{ij}\in\{\lambda_{ij}\mid g_{ij}(x^{*})<0\}\), \(\lambda_{ij}=0\). Then from equation (4.8a) we have \(\nabla f_{i}(x^{*})+\sum_{j}^{r_{i}}\lambda_{ij}\nabla g_{ij}(x^{*})+\sum_{j}^{s_{i}}\nu _{ij}\nabla h_{ij}(x^{*})=0\). Therefore the KKT condition is satisfied, and by uniqueness \(\lambda_{ij}=\lambda_{ij}^{*}\), \(\nu_{ij}=\nu_{ij}^{*}\). Therefore \((X, \lambda, v)=(X^{*}, \lambda^{*}, v^{*})\). The application of the Lasalle invariance principle in [40] yields that \((X^{*}, \lambda^{*}, v^{*})\) of system (4.8a)–(4.8d) is asymptotically stable almost surely. □


Consider the optimization problem (3.1) on a network with five agents. The five local cost functions for five agents are as follows: \(f_{1}(x_{1},x_{2})=4x_{1}^{2}+2x_{2}\), \(f_{2}(x_{1},x_{2})=2x^{2}\), \(f_{3}(x_{1},x_{2})=4x_{1}\), \(f_{4}(x_{1},x_{2})=2x_{2}\), \(f_{5}(x_{1},x_{2})=3x_{1}+x_{2}\). Assume that agent 1 has both inequality and equality constraints with constraint functions \(g_{1}(x_{1}, x_{2})=(x_{1}-2)^{2}-x_{2}+1\) and \(h_{1}(x_{1}, x_{2})=2x_{1}-x_{2}\). Agent 2 has the inequality constraint \(g_{2}(x_{1},x_{2})=-x_{1}+x_{2}-2\), whereas there are no constraints for agents 3, 4, 5. We can check that all functions mentioned are convex and the constrained set is nonempty. The true optimal solution and optimal value of this problem are \((x_{1}^{*}, x_{2}^{*})=(1, 2)\) and \(\tilde{f}(x_{1}^{*}, x_{2}^{*})=23\).

In the literature the method of projected gradient descent is widely used to solve this problem where the projected gradient is hard to compute. For example, if the gradient projection algorithm in [37] is used here, then we should solve the following iterative problem: \(x_{k+1}=\Pi_{\mathcal {X}}(x_{k}-r_{k}f'(x_{k}))\), where \(\Pi_{\mathcal {X}}(x)\) is the projection of a vector \(x\in \mathbb{R}^{n}\) onto the convex compact set \(\mathcal {X}\subset\mathbb {R}^{n}\) in the sense of \(\Pi_{\mathcal {X}}(x)=\inf_{y\in\mathcal {X}}\| y-x\|\). In our case, \(\mathcal {X}=\{x\in\mathbb{R}^{n}\mid g_{i}(x) \preccurlyeq0, h_{i}(x)=0, i=1, \ldots, N\}\). Obviously, computation of \(\Pi_{\mathcal {X}}(x_{k}-r_{k}f'(x_{k}))\) is a heavy job, and additional algorithms such as [11, 41] should be adopted. By contrast, our algorithm (4.8a)–(4.8d) avoids using projection and thus is easier to carry out.

Now we apply our distributed convex optimization algorithm (4.8a)–(4.8d) by using five agents that are connected as in Fig. 1 to find the optimal solution. The Laplacian matrix of this undirect graph is

$$ \mathcal {L}= \left ( \textstyle\begin{array}{@{}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{}} 1 & -1 & 0 & 0 & 0\\ -1 & 2 & -1 & 0 & 0\\ 0 & -1 & 3 & -1 & -1\\ 0 & 0 & -1 & 1 & 0\\ 0 & 0 & -1 & 0 & 1 \end{array}\displaystyle \right ). $$

Set the initial values of the five agents states as \(x_{1}(0)=(-2, 4)^{T}\), \(x_{2}(0)=(-3, 3)^{T}\), \(x_{3}(0)=(1, -2)^{T}\), \(x_{4}(0)=(-1, 2)^{T}\), \(x_{5}(0)=(4, 2)^{T}\), \(\lambda_{1}(0)=3\), \(\lambda_{2}(0)=3\), \(\nu(0)=3\), respectively. The time evolution of the states for five agents is illustrated in Fig. 2, where subfigure (a) shows the first component of each state, which asymptotically converges to 1, and subfigure (b) shows the second component of each state, which asymptotically converges to 2. Therefore, each state of the five agents asymptotically converges to the optimal solution \((1, 2)\).

Figure 1
figure 1

The information exchange among five agents

Figure 2
figure 2

The time evolution of the states for 5 agents: (a) The first component of each state, which asymptotically converges to 1; (b) The second component of each state which asymptotically converges to 2. Therefore, each state of the 5 agents converges to the optimal solution \((1, 2)\)

Some comparisons of our method with existing algorithm are included here. For example, in the spirit of the primal-dual algorithm in [30], the dynamics for the Lagrange multiplier should be designed as \(\dot{\lambda}=\Gamma P_{+}[-(\frac{\partial L}{\partial\lambda })]=\Gamma P_{+}[g(x)]\) with any positive definite matrix Γ. To make the value of multiplier corresponding to the inequality constraint stay positive all the time, the projection \(P_{+}[g(x)]\) is also used, and thus the corresponding dynamics is nonsmooth. This obviously causes difficulties in simulation and convergence analysis. In contrast, our algorithm uses equations (4.8c)–(4.8d) for the evolution of the Lagrange multipliers, which are smooth dynamics and easy to simulate.


When considering distributed convex optimization using multiagent systems, a consensus-based distributed method is usually adopted. It is well known that, for a distributed optimization problem, it is a challenging problem if the constraint is taken into account. One popular method to deal with constraints is based on the projection, which causes the optimization dynamics to be nonsmooth. To overcome this difficulty, we propose a novel distributed convex optimization algorithm utilizing the mirror descent method. Although there are reported results in the field of the mirror descent for the distributed optimization problem, most of them are presented in the discrete-time form. For the results on continuous-time mirror descent, they work only for unconstrained optimization. Our algorithm is valid for the more challenging case of constrained optimization. The superiority of our method in comparison with existing results lies in the fact that our method avoids using gradient projection in the optimization algorithm design. Therefore, it removes the difficulty of analyzing the resulting nonsmooth optimization dynamics and makes the simulation easier. By the aid of mirror descent, in our paper, we modify the frequently used primal-dual algorithm for the optimization problem, giving rise to new primal and dual dynamics. The modified primal dynamics facilitates more convenient use of Bregman divergence and Frenchel coupling in the stability analysis, and the redesigned dual dynamics for the evolution of positive Lagrange multiplier does not include projection and therefore reduces the complexity of convergence analysis.

Note that avoiding projection (and therefore avoiding using nonsmooth optimization dynamics) is a key in our paper. This forms a sharp contrast to the nonsmooth optimization problem as in [9]. A direct connection and adoption of our method to the nonsmooth case as in [9] however needs further investigation.


  1. Baleanu, D., Inc, M., Yusuf, A., Aliyu, A.I.: Time fractional third-order evolution equation: symmetry analysis, explicit solutions, and conservation laws. J. Comput. Nonlinear Dyn. 13, 021011 (2017)

    Article  Google Scholar 

  2. Baleanu, D., Jajarmi, A., Hajipour, M.: A new formulation of the fractional optimal control problems involving Mittag-Leffler nonsingular kernel. J. Optim. Theory Appl. 175, 718–737 (2017)

    MathSciNet  Article  Google Scholar 

  3. Baleanu, D., Inc, M., Yusuf, A., AliyuI, A.: Lie symmetry analysis, exact solutions and conservation laws for the time fractional Caudrey–Dodd–Gibbon–Sawada–Kotera equation. Commun. Nonlinear Sci. Numer. Simul. 59, 222–234 (2017)

    MathSciNet  Article  Google Scholar 

  4. Bai, Y., Baleanu, D., Wu, G.C.: Existence and discrete approximation for optimization problems governed by fractional differential equations. Commun. Nonlinear Sci. Numer. Simul. 59, 338–348 (2018)

    MathSciNet  Article  Google Scholar 

  5. Farnad, B., Jafarian, A., Baleanu, D.: A new hybrid algorithm for continuous optimization problem. Appl. Math. Model. 55, 652–673 (2018)

    MathSciNet  Article  Google Scholar 

  6. Hajipour, A., Malek, A.: High accurate modified Weno method for the solution of Black–Scholes equation. Comput. Appl. Math. 34, 125–140 (2015)

    MathSciNet  Article  Google Scholar 

  7. Jajarmi, A., Hajipour, M.: An efficient finite difference method for the time-delay optimal control problems with time-varying delay. Asian J. Control 19, 554–563 (2017)

    MathSciNet  Article  Google Scholar 

  8. Razminia, A., Baleanu, D., Majd, V.: Conditional optimization problems: fractional order case. J. Optim. Theory Appl. 156, 45–55 (2013)

    MathSciNet  Article  Google Scholar 

  9. Vaziri, A., Kamyad, A., Jajarmi, A., Effati, S.: A global linearization approach to solve nonlinear nonsmooth constrained programming problems. Comput. Appl. Math. 30, 427–443 (2011)

    MathSciNet  Article  Google Scholar 

  10. Inc, M., Yusuf, A., Aliyu, A.I., Baleanu, D.: Lie symmetry analysis, explicit solutions and conservation laws for the space–time fractional nonlinear evolution equations. Phys. A, Stat. Mech. Appl. 496, 371–383 (2018)

    MathSciNet  Article  Google Scholar 

  11. Nedic, A., Ozdaglar, A.: Distributed subgradient method for multi-agent optimization. IEEE Trans. Autom. Control 54, 48–61 (2009)

    MathSciNet  Article  Google Scholar 

  12. Yi, P., Hong, Y.: Quantized subgradient algorithm and date-rate analysis for distributed optimization. IEEE Trans. Control Netw. Syst. 1, 380–392 (2014)

    MathSciNet  Article  Google Scholar 

  13. Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. Control 57, 592–606 (2012)

    MathSciNet  Article  Google Scholar 

  14. Deng, Z., Hong, Y., Wang, X.: Distributed optimisation design with triggers for disturbed continuous-time multi-agent systems. IET Control Theory Appl. 11, 282–290 (2017)

    MathSciNet  Article  Google Scholar 

  15. Zhu, M., Martinez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57, 151–164 (2012)

    MathSciNet  Article  Google Scholar 

  16. Yi, P., Hong, Y., Liu, F.: Distributed gradient algorithm for constrained optimization with application to load sharing in power systems. Syst. Control Lett. 83, 45–52 (2015)

    MathSciNet  Article  Google Scholar 

  17. Zeng, X., Yi, P., Hong, Y.: Distributed continuous time algorithm for constrained convex optimizations via nonsmooth analysis approach. IEEE Trans. Autom. Control 62, 5227–5233 (2016)

    MathSciNet  Article  Google Scholar 

  18. Lou, Y., Hong, Y., Wang, S.: Distributed continuous-time approximate projection protocols for shortest distance optimization problems. Automatica 69, 289–297 (2016)

    MathSciNet  Article  Google Scholar 

  19. Calafiore, G., Carlone, L., Wei, M.: A distributed gradient method for localization of formations using relative range measurements. In: IEEE Int. Symp. on Computer-Aided Control System Design, pp. 1146–1151 (2010)

    Google Scholar 

  20. Chen, J.S., Sayed, A.H.: Diffusion adaptation strategies for distributed optimization and learning over network. IEEE Trans. Signal Process. 60, 4289–4305 (2012)

    MathSciNet  Article  Google Scholar 

  21. Neglia, G., Reina, G., Alouf, S.: Distributed gradient optimization for epidemic routing: a preliminary evaluation. In: IFIP Conference on Wireless Days, pp. 1–6 (2009)

    Google Scholar 

  22. Ram, S., Nedic, A.: Distributed subgradient method for multi-agent optimization. J. Optim. Theory Appl. 147, 516–545 (2010)

    MathSciNet  Article  Google Scholar 

  23. Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  24. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 13, 167–175 (2003)

    MathSciNet  Article  Google Scholar 

  25. Ben-Tal, A., Margalit, T., Nemirovski, A.: The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12(1), 79–108 (2006)

    MathSciNet  Article  Google Scholar 

  26. Li, J.Y., Chen, G.: Distributed mirror descent method for multi-agent optimization with delay. Neurocomputing 177, 643–650 (2016)

    Article  Google Scholar 

  27. Wibisono, A., Wilson, A.C.: On accelerated methods in optimization. (2015)

  28. Xi, C., Wu, Q., Khan, U.A.: Distributed mirror descent over directed graphs. (2014)

  29. Raginsky, M., Bouvrie, J.: Continuous-time stochastic mirror descent on a network: variance reduction, consensus, convergence. In: Decision and Control, pp. 6793–6800 (2012)

    Google Scholar 

  30. Feijer, P.F.D.: Stability of primal-dual gradient dynamics and applications to network optimization. Automatica 46, 1974–1981 (2010)

    MathSciNet  Article  Google Scholar 

  31. Tanabe, K.: A geometric method in nonlinear programming. J. Optim. Theory Appl. 30, 181–210 (1980)

    MathSciNet  Article  Google Scholar 

  32. Wang, J., Elia, N.: A control perspective for centralized and distributed convex optimization. In: Decision and Control and European Control Conference, pp. 3800–3805 (2011)

    Chapter  Google Scholar 

  33. Gharesifard, B., Cortes, J.: Distributed continuous time convex optimization on weighted balanced digraphs. IEEE Trans. Autom. Control 59, 781–786 (2014)

    Article  Google Scholar 

  34. Towfic, Z.J., Sayed, A.: Adaptive penalty-based distributed stochastic convex optimization. IEEE Trans. Signal Process. 62, 3924–3938 (2014)

    MathSciNet  Article  Google Scholar 

  35. Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Nonlinear Programming. Mathematical Studies in the Social Sciences (1958)

    MATH  Google Scholar 

  36. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambrdige University Press, Cambrdige (2004)

    Book  Google Scholar 

  37. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific (1999)

  38. Brègma, L.M.: Relaxation method for finding a common point of convex sets and its application to optimization problems. Comput. Math. Math. Phys. 48(2), 1019–1022 (1966)

    MathSciNet  Google Scholar 

  39. Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)

    Book  Google Scholar 

  40. Mao, X.: Stochastic versions of the LaSalle theorem. J. Differ. Equ. 153, 175–195 (1999)

    MathSciNet  Article  Google Scholar 

  41. Nedic, A., Ozdaglar, A., Parrilo, P.: Constrained consensus and optimization in multi-agent network. IEEE Trans. Autom. Control 55, 922–938 (2010)

    MathSciNet  Article  Google Scholar 

Download references


The authors would like to thank the referees for their careful reviews.


This work is supported by he NNSF of China under the grants 61663026, 61473098, 61563033, 11361043, 61603175.

Author information

Authors and Affiliations



Both authors contributed equally to the writing of this paper. Both authors of the manuscript have read and agreed to its content and are accountable for all aspects of the accuracy and integrity of the manuscript.

Corresponding author

Correspondence to Wei Ni.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sheng, R., Ni, W. Distributed constrained optimization via continuous-time mirror design. Adv Differ Equ 2018, 376 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Distributed convex optimization
  • Mirror descent
  • Multiagent
  • Constrained optimization