Skip to main content

Advertisement

Noise-tolerant continuous-time Zhang neural networks for time-varying Sylvester tensor equations

Article metrics

  • 201 Accesses

Abstract

In this paper, to solve the time-varying Sylvester tensor equations (TVSTEs) with noise, we will design three noise-tolerant continuous-time Zhang neural networks (NTCTZNNs), termed NTCTZNN1, NTCTZNN2, NTCTZNN3, respectively. The most important characteristic of these neural networks is that they make full use of the time-derivative information of the TVSTEs’ coefficients. Theoretical analyses show that no matter how large the unknown noise is, the residual error generated by NTCTZNN2 converges globally to zero. Meanwhile, as long as the design parameter is large enough, the residual errors generated by NTCTZNN1 and NTCTZNN3 can be arbitrarily small. For comparison, the gradient-based neural network (GNN) is also presented and analyzed to solve TVSTEs. Numerical examples and results demonstrate the efficacy and superiority of the proposed neural networks.

Introduction

As is well known, tensors are higher order generalizations of matrices, which are common tools to construct the mathematical models of systems in high dimension. For example, a black and white image (including width and height) can be stored as a matrix, while an RGB image (including width, height and brightness) is often stored as a three-order tensor, and a color video image (including width, height, brightness and time) must be stored as a four-order tensor. An mth-order real tensor \(\mathcal{A}=(a_{i_{1}i_{2}\ldots i_{m}})\) (\(a_{i_{1}i_{2}\ldots i _{m}}\in \mathbb{R}\), \(1\leq i_{j}\leq I_{j}\), \(j\in \langle m\rangle =\{1,2,\ldots,m\}\)) is a multidimensional array with \(I_{1}I_{2}\cdots I _{n}\) entries. Clearly, an order one tensor is a vector, and an order two tensor is a matrix. Let \(\mathbb{R}^{I_{1}\times \cdots \times I _{n}}\) be the set of order n, dimension \(I_{1}\times \cdots \times I_{n}\) tensors over \(\mathbb{R}\). For any two tensors \(\mathcal{A} \in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{n}}\) and \(\mathcal{B}\in \mathbb{R}^{K_{1}\times \cdots \times K_{n}\times J_{1}\times \cdots \times J_{m}}\), the Einstein product \(\mathcal{A}*_{n}\mathcal{B}\) is defined by [1]

$$ (\mathcal{A}*_{n}\mathcal{B})_{i_{1}\cdots i_{n}j_{1}\cdots j_{m}}= \sum_{k_{1}\cdots k_{n}}a_{i_{1}\cdots i_{n}k_{1}\cdots k_{n}}b_{k _{1}\cdots k_{n} j_{1}\cdots j_{m}}, $$
(1)

which indicates that \(\mathcal{A}*_{n}\mathcal{B}\in \mathbb{R}^{I _{1}\times \cdots \times I_{n}\times J_{1}\times \cdots \times J_{m}}\). It is obvious that, when \(m=n=1\), the Einstein product (1) reduces to a matrix product.

In practice, various kinds of tensor equations arise from physics, mechanics, Markov process and partial differential equations. In this paper, we are interested in the following time-varying Sylvester tensor equations (TVSTEs) via the Einstein product:

$$ \mathcal{A}(t)*_{n}\mathcal{X}+ \mathcal{X}*_{m}\mathcal{B}(t)= \mathcal{C}(t), \quad t\geq 0, $$
(2)

where \(\mathcal{A}(t)\in \mathbb{R}^{I_{1}\times \cdots \times I_{n} \times I_{1}\times \cdots \times I_{n}}\), \(\mathcal{B}(t)\in \mathbb{R}^{K_{1}\times \cdots \times K_{m}\times K_{1}\times \cdots \times K_{m}}\), \(\mathcal{C}(t)\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}\) are input tensors, and \(\mathcal{X}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}\) is an unknown tensor to be determined. TVSTEs can be used to discrete a linear partial differential equation by the finite element and finite difference. For example, for noncentrosymmetric materials in physics, the linear piezoelectric equation is expressed as [2]:

$$ \mathcal{A}*_{2}X=p, $$

where \(\mathcal{A}\in \mathbb{R}^{3\times 3\times 3}\) is a piezoelectric tensor, and \(X\in \mathbb{R}^{3\times 3}\) is a stress tensor, and \(p\in \mathbb{R}^{3}\) is an electric change density displacement. The two-dimensional (2D) Poisson problem is

$$ \textstyle\begin{cases} -\nabla ^{2}v=f, & \text{in } \varOmega =\{(x,y)\mid 0< x,y< 1\}, \\ v=0, & \text{on } \partial \varOmega , \end{cases} $$
(3)

where f is a given function, and the Laplace operator \(\nabla ^{2}v\) is defined as

$$ \nabla ^{2}v=\frac{\partial ^{2}v}{\partial x^{2}}+\frac{\partial ^{2}v}{ \partial y^{2}}. $$

By the standard central difference approximations, Poisson’s equation (3) in two dimensions can be depicted as the following four-order tensor representation [3]:

$$ \mathcal{A}*_{2}X=C, $$

where \(\mathcal{A}\in \mathbb{R}^{n\times n\times n\times n}\), \(X\in \mathbb{R}^{n\times n}\) and \(C\in \mathbb{R}^{n\times n}\) are the discretized functions v and f on the unit square mesh. Similarly, the three-dimensional (3D) discretized Poisson problem can be depicted as [3]:

$$ \mathcal{A}*_{3}\mathcal{X}=\mathcal{C}, $$

where \(\mathcal{A}\in \mathbb{R}^{n\times n\times n\times n\times n \times n}\), \(\mathcal{X}\) and \(\mathcal{C}\in \mathbb{R}^{n\times n \times n}\).

TVSTEs include the following special cases:

  1. (1).

    If \(m=n=1\), TVSTEs reduce to the time-varying Sylvester matrix equations (TVSMEs) [4]:

    $$ {A}(t){X}+{X} {B}(t)={C}(t), \quad t\geq 0, $$
    (4)

    where \(A(t)\), \(B(t)\), \(C(t)\), and X are matrices with compatible dimension; and if \({A}(t)\), \({B}(t)\), and \({C}(t)\) are further time invariant, TVSMEs reduce to the following classic Sylvester matrix equations (SMEs):

    $$ {A} {X}+X{B}={C}, $$
    (5)

    which again includes the well-known Lyapunov matrix equations and the Stein matrix equations as its special cases [5,6,7]. The SMEs serves as a basic model arising from control theory, system theory, stability analysis etc.

  2. (2).

    If \(\mathcal{A}(t)\), \(\mathcal{B}(t)\) and \(\mathcal{C}(t)\) are time invariant, TVSTEs reduce to the Sylvester tensor equations (STEs) [8]:

    $$ \mathcal{A}*_{n}\mathcal{X}+ \mathcal{X}*_{m}\mathcal{B}=\mathcal{C}, $$
    (6)

    which comes from the finite difference, finite element or spectral methods [9, 10] and plays an important role in discretization of the linear partial differential equations in high dimension.

A very basic and important problem in the study of the above equations concerns their solutions. In the past several decades, many researchers have carried out their work to find analytical and numerical solutions of SMEs, TVSMEs and STEs. For example, Ding and Chen [11] presented a gradient-based iterative algorithm for SMEs. Zhang et al. [4] introduced a recurrent neural network with implicit dynamics for the approximate solution of TVSMEs. Wang and Xu [12] developed some iterative algorithms for solving some tensor equations, which were generalized by Huang and Ma [13] to solve STEs. When the above equations are inconsistent, Lv and Zhang [14] designed an iterative algorithm to find the least squares solutions of SMEs. Sun and Wang [7] extended the conjugate gradient method to get the least squares solution with the least Frobenius norm of the generalized periodic SMEs. Specifically, during the past ten years, Hajarian et al. have conducted intensive research on the iterative method for solving various matrix equations, such as the generalized conjugate direction algorithm for solving the general coupled matrix equations over symmetric matrices [15], the finite algorithm for solving the generalized nonhomogeneous Yakubovich-transpose matrix equation [16], and the symmetric solutions of general Sylvester matrix equations via the Lanczos version of the biconjugate residual algorithm [17]. A tensor form of the conjugate gradient method was given to solve inconsistent tensor equations [12, 13]. Furthermore, by virtue of the Moore–Penrose inverse and the \(\{1\}\)-inverse of the tensor, Sun et al. [8] obtained analytic solutions of some special linear tensor equations. Other iterative methods for matrix/tensor equations and their applications can be found in [18,19,20,21,22,23,24,25,26,27] and the references therein.

To the best of the authors’ knowledge, no research has been devoted to the solutions of the TVSTEs. In this paper, we are going to extend a special kind of recurrent neural networks, i.e., the continuous-time Zhang neural network (CTZNN), to solve TVSTEs. ZNN was proposed by Yunong Zhang in March 2001 [4], which is quite suitable to solve various time-varying problems, such as time-varying nonlinear optimization [28,29,30], time-varying convex quadratic programming [31], time-varying matrix pseudoinversion [32] and time-varying absolute value equations [33]. Motivated by [4, 12, 13], we design three noise-tolerant CTZNNs (NTCTZNN1-3) for solving TVSTEs (2). Convergence analysis shows that: (1) the residual error \(\Vert \mathcal{A}(t)*_{n}\mathcal{X}+\mathcal{X}*_{m}\mathcal{B}(t)- \mathcal{C}(t) \Vert \) generated by NTCTZNN1 and NTCTZNN3 can be arbitrarily small if their design parameters are large enough; (2) the residual error of NTCTZNN2 converges to zero globally no matter how large the unknown noise is, where the Frobenius norm \(\Vert \cdot \Vert \) of \(\mathcal{A}\) is defined as \(\Vert \mathcal{A} \Vert = ( \sum_{i_{1}\cdots i_{n}j_{1}\cdots j_{n}} \vert a_{i_{1}\cdots i_{n}j_{1} \cdots j_{n}} \vert ^{2} )^{1/2}\) for a tensor \(\mathcal{A}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times J_{1}\times \cdots \times J_{n}}\).

The remainder of this paper is organized as follows. In Sect. 2, we introduce some necessary notations that are essential to derive main results of this paper. In Sect. 3, we propose three NTCTZNNs and GNN methods to solve TVSTEs, and establish their convergence. In Sect. 4, to verify the effectiveness of NTCTZNN to solve TVSTEs, we present two numerical examples. Finally, some brief conclusions are drawn in Sect. 5.

Preliminaries

In this section, we introduce some useful notations and recall some known results.

Definition 2.1

([12, 34])

  1. (1)

    Let \(\mathcal{A}= (a_{i_{1}\cdots i_{n}j_{1}\cdots j_{m}} )\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times J_{1}\times \cdots \times J_{m}}\), \(\mathcal{B}= (b_{j_{1}\cdots j_{m} i_{1}\cdots i _{n}} )\in \mathbb{R}^{J_{1}\times \cdots \times J_{m}\times I_{1} \times \cdots \times I_{n}}\), if \(a_{i_{1}\cdots i_{n}j_{1}\cdots j _{m}}=b_{j_{1}\cdots j_{m} i_{1}\cdots i_{n}}\), then the tensor \(\mathcal{B}\) is called the transpose of \(\mathcal{A}\), denoted by \(\mathcal{A}^{\top }\).

  2. (2)

    A tensor \(\mathcal{D}= (d_{i_{1}\cdots i_{n}j_{1}\cdots j_{n}} )\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times J_{1} \times \cdots \times J_{n}}\) is a diagonal tensor if \(d_{i_{1}\cdots i_{n}j_{1}\cdots j_{n}}=0\) in the case that the indices \(i_{1}\cdots i_{n}\) are different from \(j_{1}\cdots j_{n}\). Furthermore, in this case if all the diagonal entries \(d_{i_{1}\cdots i_{n}i_{1}\cdots i_{n}}=1\), then \(\mathcal{D}\) is called a unit tensor, denoted by \(\mathcal{I}\). Specially, if all the entries of a tensor are zero, we say this tensor a zero tensor, denoted by \(\mathcal{O}\).

  3. (3)

    The trace of a tensor \(\mathcal{A}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times I_{1}\times \cdots \times I_{n}}\) is defined as \(\operatorname{tr}(\mathcal{A})=\sum_{i_{1}\cdots i_{n}i_{1}\cdots i_{n}}a _{i_{1}\cdots i_{n}i_{1}\cdots i_{n}}\).

  4. (4)

    Let \(\mathcal{A},\mathcal{B}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times I_{1}\times \cdots \times I_{n}}\), the inner product of \(\mathcal{A}\), \(\mathcal{B}\) is defined by

    $$ \langle \mathcal{A},\mathcal{B}\rangle = \operatorname{tr}\bigl(\mathcal{B}^{ \top }*_{n}\mathcal{A} \bigr). $$
    (7)

In the following, we set \(I=I_{1}I_{2}\cdots I_{n}\), \(J=J_{1}J_{2} \cdots J_{n}\), \(K=K_{1}K_{2}\cdots K_{m}\). Motivated by Definition 2.3 in [12], we give the following definition.

Definition 2.2

Define the transformation \(\varPhi _{IK}: \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}\rightarrow \mathbb{R}^{I\times K}\) with \(\varPhi _{IK}( \mathcal{A})=A\) defined component-wise as

$$ (\mathcal{A})_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}\rightarrow (A)_{st}, $$
(8)

where \(\mathcal{A}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n} \times K_{1}\times \cdots \times K_{m}}\), \(A\in \mathbb{R}^{I\times K}\), \(s=i_{n}+\sum_{p=1}^{n-1}((i_{p}-1)\prod_{q=p+1}^{n}I_{q})\) and \(t=k_{m}+\sum_{p=1}^{m-1}((k_{p}-1)\prod_{q=p+1}^{m}K_{q})\). Note we use the convention \(\sum_{p=1}^{0}a_{p}=0\).

Example 2.1

Let \(\mathcal{A}=(a_{i_{1}i_{2}k_{1}k_{2}k_{3}}) \in \mathbb{R}^{2\times 2\times 2\times 2\times 2}\) be a tensor such that

$$\begin{aligned}& \mathcal{A}(:,:,1,1,1)= \begin{bmatrix} 1&2 \\ 3&4 \end{bmatrix} , \quad\quad \mathcal{A}(:,:,1,1,2)= \begin{bmatrix} 5&6 \\ 7&8 \end{bmatrix} , \quad\quad \mathcal{A}(:,:,1,2,1)= \begin{bmatrix} 9&10 \\ 11&12 \end{bmatrix} , \\& \mathcal{A}(:,:,1,2,2)= \begin{bmatrix} 13&14 \\ 15&16 \end{bmatrix} , \quad\quad \mathcal{A}(:,:,2,1,1)= \begin{bmatrix} 17&18 \\ 19&20 \end{bmatrix} , \\& \mathcal{A}(:,:,2,1,2)= \begin{bmatrix} 21&22 \\ 23&24 \end{bmatrix} , \quad\quad\quad \mathcal{A}(:,:,2,2,1)= \begin{bmatrix} 25&26 \\ 27&28 \end{bmatrix} , \\& \mathcal{A}(:,:,2,2,2)= \begin{bmatrix} 29&30 \\ 31&32 \end{bmatrix} . \end{aligned}$$

By Definition 2.2, we have

$$ A= \begin{bmatrix} 1&5&9&13&17&21&25&29 \\ 2&6&10&14&18&22&26&30 \\ 3&7&11&15&19&23&27&31 \\ 4&8&12&16&20&24&28&32 \end{bmatrix} . $$

Remark 2.1

Given the positive integers \(I_{1},\ldots,I _{n}\), \(K_{1},\ldots,K_{m}\), we can define an inverse function of \(\varPhi _{IK}: \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1} \times \cdots \times K_{m}}\rightarrow \mathbb{R}^{I\times K}\) defined in Definition 2.2 as follows:

$$ \varPhi _{IK}^{-1}: \mathbb{R}^{I\times K} \rightarrow \mathbb{R}^{I_{1} \times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}, $$

with \((A)_{st}\rightarrow (\mathcal{A})_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}\), where the jth column of the matrix A consists the jth element in the set \(\{\mathcal{A}(:,\ldots,:,k_{1},\ldots,k_{m})\mid \forall k_{1},\ldots,k_{m}\}\). Here we sort all the elements in this set in lexicographic order, that is, from \((1,\ldots,1)\) to \((K_{1},\ldots,K_{m})\).

It is well known that using the Kronecker product and the vectorization operator, one can transform a linear matrix equations into a linear equations. Similarly, the following proposition indicates that the transformation \(\varPhi _{IK}\) can transform a linear tensor equations into a linear matrix equations.

Proposition 2.1

([12])

Let \(\varPhi _{IJ}\)be defined as Definition 2.2. Then

$$ \mathcal{A}*_{n}\mathcal{X}=\mathcal{C} \quad \Longleftrightarrow \quad \varPhi _{IJ}( \mathcal{A}) \varPhi _{JK}(\mathcal{X})=\varPhi _{IK}(\mathcal{C}), $$
(9)

where the tensors \(\mathcal{A}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times J_{1}\times \cdots \times J_{n}}\), \(\mathcal{X} \in \mathbb{R}^{J_{1}\times \cdots \times J_{n}\times K_{1}\times \cdots \times K_{m}}\), \(\mathcal{C}\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}\).

The following lemma is a direct generalization of the derivative principle of a matrix.

Lemma 2.1

Let \(\mathcal{A}(t)\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times J_{1}\times \cdots \times J_{n}}\), \(\mathcal{X}(t)\in \mathbb{R}^{J_{1}\times \cdots \times J_{n}\times K_{1}\times \cdots \times K_{m}}\). We have

$$ \bigl(\mathcal{A}(t)*_{n}\mathcal{X}(t) \bigr)'_{t}= \bigl(\mathcal{A}(t) \bigr)'_{t}*_{n}\mathcal{X}(t)+ \mathcal{A}(t)*_{n} \bigl(\mathcal{X}(t) \bigr)'_{t}. $$
(10)

Proof

Assume that \(\varPhi _{IJ}(\mathcal{A}(t))=A(t)\), \(\varPhi _{JK}(\mathcal{X}(t))=X(t)\). Then, by Definition 2.2, one has \(A(t)\in \mathbb{R}^{IJ}\), \(X(t)\in \mathbb{R}^{JK}\). By the derivative principle of a matrix, we have

$$\begin{aligned}& \bigl( \bigl(\mathcal{A}(t)*_{n}\mathcal{X}(t) \bigr)'_{t} \bigr)_{i_{1} \cdots i_{n}k_{1}\cdots k_{m}} \\& \quad = \bigl( \bigl(\mathcal{A}(t)*_{n}\mathcal{X}(t) \bigr)_{i_{1}\cdots i _{n}k_{1}\cdots k_{m}} \bigr)'_{t} \\& \quad = \biggl(\sum_{j_{1}\cdots j_{n}}a_{i_{1}\cdots i_{n}j_{1}\cdots j_{n}}(t)x _{j_{1}\cdots j_{n} k_{1}\cdots k_{m}}(t) \biggr)'_{t} \\& \quad = \Biggl(\sum_{v=1}^{J}A_{sv}(t)X_{vw}(t) \Biggr)'_{t} \\& \quad =\sum_{v=1}^{J} \bigl(A_{sv}(t)X_{vw}(t) \bigr)'_{t} \\& \quad =\sum_{v=1}^{J} \bigl( \bigl(A_{sv}(t) \bigr)'_{t} X_{vw}(t)+A_{sv}(t) \bigl(X_{vw}(t) \bigr)'_{t} \bigr) \\& \quad = \bigl( \bigl(A(t)X(t) \bigr)'_{t} \bigr)_{sw} \\& \quad = \bigl( \bigl(A(t) \bigr)'_{t} X(t)+A(t) \bigl(X(t) \bigr)'_{t} \bigr)_{sw} \\& \quad = \bigl(\varPhi ^{-1}_{IK} \bigl[ \bigl(A(t) \bigr)'_{t} X(t)+A(t) \bigl(X(t) \bigr)'_{t} \bigr] \bigr)_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}} \\& \quad = \bigl(\varPhi ^{-1}_{IK} \bigl[ \bigl(\varPhi _{IJ}\bigl(\mathcal{A}(t)\bigr) \bigr)'_{t} \varPhi _{JK}\bigl(\mathcal{X}(t)\bigr)+\varPhi _{IJ} \bigl(\mathcal{A}(t)\bigr) \bigl(\varPhi _{JK}\bigl( \mathcal{X}(t) \bigr) \bigr)'_{t} \bigr] \bigr)_{i_{1}\cdots i_{n}k_{1}\cdots k _{m}} \\& \quad = \bigl(\varPhi ^{-1}_{IK} \bigl[\varPhi _{IK} \bigl( \bigl(\mathcal{A}(t) \bigr)'_{t}*_{n} \mathcal{X}(t) \bigr)+\varPhi _{IK} \bigl(\mathcal{A}(t) \bigl( \mathcal{X}(t) \bigr)'_{t}*_{n} \bigr) \bigr] \bigr)_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}} \\& \quad = \bigl( \bigl(\mathcal{A}(t) \bigr)'_{t}*_{n} \mathcal{X}(t)+\mathcal{A}(t)*_{n} \bigl(\mathcal{X}(t) \bigr)'_{t} \bigr)_{i_{1}\cdots i_{n}k_{1}\cdots k _{m}}, \end{aligned}$$

where s, v and w are defined as in Definition 2.1, and the second-to-last equality comes from Proposition 2.1. This completes the proof. □

NTCTZNNs and GNN for TVSTEs

In this section, we will present three noise-tolerant continuous-time Zhang neural networks (NTCTZNNs) and a gradient-based neural network (GNN) for TVSTEs, and we analyze their convergence property.

Firstly, following Zhang et al.’s design method [4], we define a tensor-valued indefinite error-function as follows:

$$ \mathcal{E}(t)=\mathcal{A}(t)*_{n} \mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t)\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}, $$
(11)

where every entry denoted by \(e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \in \mathbb{R}\) may be negative or unbounded.

Then, let us recall the Zhang neural network (ZNN) design formula (also termed Zhang et al.’s design formula)

$$ \dot{\mathcal{E}}(t)=-\gamma F\bigl(\mathcal{E}(t)\bigr), $$

where \(\gamma >0\) is a design parameter, \(F(\cdot )\) is a tensor activation function defined on every entry of the error-function \(\mathcal{E}(t)\), and \(\dot{\mathcal{E}}(t)\) is a component-wise derivative of the error-function \(\mathcal{E}(t)\). By (10) in Lemma 2.1, substituting the error-function \(\mathcal{E}(t)\) into the above Zhang neural network (ZNN) design formula, we get a continuous-time ZNN (CTZNN):

$$ \begin{aligned}[b] &\mathcal{A}(t)*_{n} \dot{\mathcal{X}}+\dot{\mathcal{X}}*_{m} \mathcal{B}(t) \\ &\quad =-\gamma F\bigl(\mathcal{A}(t)*_{n}\mathcal{X}+ \mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t)\bigr)-\dot{ \mathcal{A}}(t)*_{n}\mathcal{X}- \mathcal{X}*_{m}\dot{ \mathcal{B}}(t)+\dot{\mathcal{C}}(t), \end{aligned} $$
(12)

where \(\dot{\mathcal{A}}(t)\), \(\dot{\mathcal{B}}(t)\), \(\dot{\mathcal{C}}(t)\) denote the time-derivatives of tensors \(\mathcal{A}(t)\), \(\mathcal{B}(t)\), \(\mathcal{C}(t)\), respectively.

Remark 3.1

Let \(f(\cdot )\) be the entry of the tensor-valued function \(F(\cdot )\). The function \(f(\cdot )\) can be set as any odd and strictly monotone increasing function. There are six basic types of activation functions in the literature, i.e., the linear activation function, the power activation function, the bipolar sigmoid activation function, the power-sigmoid activation function, the sign-bi-power activation function and the power-sum activation function [35]. Note that different activation functions result in different numerical performance of CTZNN (12). Generally speaking, the performance of CTZNN with power-sigmoid activation function is better than that of CTZNN with linear activation function, and CTZNN with sign-bi-power activation function often converges in a limited time.

CTZNN (12) is suitable to solve TVSTEs without noise. However, noise is ubiquitous, which should not be ignored in real life. In the following, we only consider the constant noise for simplicity, which is denoted by \(\eta (t)=\eta \). Setting \(\bar{\eta }=(\eta )\in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}\), whose entries all equal to η. Based on the results about CTZNNs in the literature [36,37,38], we are going to propose three noise-tolerant CTZNNs for TVSTEs.

(1) Setting \(e(t)=\mathcal{E}(t)\) in the following improved Zhang design formula:

$$ \dot{e}(t)=-\gamma e(t)+\bar{\eta }, \quad \gamma >0, $$
(13)

we get the first noise-tolerant CTZNN (NTCTZNN1) for TVSTEs:

$$ \begin{aligned}[b] &\mathcal{A}(t)*_{n} \dot{\mathcal{X}}+\dot{\mathcal{X}}*_{m} \mathcal{B}(t) \\ &\quad =-\gamma \bigl(\mathcal{A}(t)*_{n}\mathcal{X}+ \mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t)\bigr)-\dot{ \mathcal{A}}(t)*_{n}\mathcal{X}- \mathcal{X}*_{m}\dot{ \mathcal{B}}(t)+\dot{\mathcal{C}}(t)+\bar{\eta }, \end{aligned} $$
(14)

whose convergence is given in the following theorem.

Theorem 3.1

NTCTZNN1 (14) converges to the theoretical solution of TVSTEs with the limit of the residual error being \(\sqrt{IK}\eta /\gamma \).

Proof

Obviously, NTCTZNN1 (14) can be written as

$$ \dot{\mathcal{E}}(t)=-\gamma \mathcal{E}(t)+\bar{\eta }, $$

which can be decoupled into the following IK differential equations:

$$ \dot{e}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=-\gamma e_{i_{1} \cdots i_{n}k_{1}\cdots k_{m}}(t)+\eta , \quad \forall i_{p}, k_{q}, p\in \langle n\rangle , q\in \langle m \rangle , $$

which has a closed-form solution:

$$ e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=e_{i_{1}\cdots i_{n}k_{1} \cdots k_{m}}(0)\exp(-\gamma t)+\frac{\eta }{\gamma }, \quad \forall i_{p}, k_{q}, p\in \langle n\rangle , q \in \langle m\rangle . $$

Furthermore, we can have

$$ \lim_{t\rightarrow \infty }e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=\frac{ \eta }{\gamma }, \quad \forall i_{p}, k_{q}, p\in \langle n\rangle , q\in \langle m\rangle . $$

So

$$ \lim_{t\rightarrow +\infty } \bigl\Vert \mathcal{E}(t) \bigr\Vert = \frac{\sqrt{IK} \eta }{\gamma }. $$

This completes the proof. □

(2) Setting \(e(t)=\mathcal{E}(t)\) in the following improved Zhang design formula [36]:

$$ \dot{e}(t)=-\gamma _{1}e(t)-\gamma _{2} \int _{0}^{t}e(s)\,ds+\bar{\eta }, \quad \gamma _{1}>0, \gamma _{2}>0, $$
(15)

we get the second noise-tolerant CTZNN (NTCTZNN2) for TVSTEs:

$$\begin{aligned} \begin{aligned}[b] &\mathcal{A}(t)*_{n} \dot{\mathcal{X}}+\dot{\mathcal{X}}*_{m} \mathcal{B}(t) \\ &\quad =-\gamma _{1}\bigl(\mathcal{A}(t)*_{n} \mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t)\bigr)- \gamma _{2} \int _{0}^{t} \bigl(\mathcal{A}(s)*_{n} \mathcal{X}+\mathcal{X}*_{m}\mathcal{B}(s)-\mathcal{C}(s) \bigr)\,ds \\ &\quad\quad{} -\dot{\mathcal{A}}(t)*_{n}\mathcal{X}- \mathcal{X}*_{m} \dot{\mathcal{B}}(t)+\dot{\mathcal{C}}(t)+\bar{\eta }. \end{aligned} \end{aligned}$$
(16)

Theorem 3.2

No matter how large the unknown noiseηis, NTCTZNN2 (16) converges to the theoretical solution of TVSTEs globally.

Proof

Obviously, NTCTZNN2 (16) can be written as

$$ \dot{\mathcal{E}}(t)=-\gamma _{1}\mathcal{E}(t)-\gamma _{2} \int _{0}^{t} \mathcal{E}(s)\,ds+\bar{\eta }, $$

which can be decoupled into IK differential equations:

$$ \dot{e}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=-\gamma _{1}e_{i_{1} \cdots i_{n}k_{1}\cdots k_{m}}(t))-\gamma _{2} \int _{0}^{t} \bigl(e_{i _{1}\cdots i_{n}k_{1}\cdots k_{m}}(s) \bigr)\,ds+\eta . $$
(17)

Taking Laplace transformation on both sides of (17), one has

$$ s\varepsilon _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s)-e_{i_{1}\cdots i _{n}k_{1}\cdots k_{m}}(0)=- \gamma _{1} \varepsilon _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s)- \frac{\gamma _{2}}{s} \varepsilon _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s)+\frac{ \eta }{s}, $$
(18)

where \(\varepsilon _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\) is the image function of \(e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\). From (18), we have

$$ \varepsilon _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s)=\frac{se_{i_{1} \cdots i_{n}k_{1}\cdots k_{m}}(0)+\eta }{s^{2}+s\gamma _{1}+\gamma _{2}}. $$

Two poles of its transfer function are

$$ s_{1,2}=\frac{-\gamma _{1}\pm \sqrt{\gamma _{1}^{2}-4\gamma _{2}}}{2}, $$

which are located on the left half-plane for any \(\gamma _{1}, \gamma _{2}>0\). Thus the system (17) is stable and the final value theorem holds. That is,

$$ \lim_{t\rightarrow \infty }e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)= \lim_{s\rightarrow 0}s \varepsilon _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s)= \lim_{s\rightarrow 0} \frac{s^{2}e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(0)+s \eta }{s^{2}+s\gamma _{1}+\gamma _{2}}=0. $$

This completes the proof. □

Remark 3.2

If \(\gamma _{2}=0\), then NTCTZNN2 (16) reduces NTCTZNN1 (14). Though NTCTZNN2 (16) is more complicate than NTCTZNN1 (14) when \(\gamma _{2}>0\), Theorems 3.1 and 3.2 indicate that NTCTZNN2 (16) is more stable than NTCTZNN1 (14) in this case.

Remark 3.3

In the practical computation, to avoid the integral manipulation, we would better transform the integro-differential equations NTCTZNN1 (14) into the following differential equations:

$$ \textstyle\begin{cases} \mathcal{A}(t)*_{n}\dot{\mathcal{X}}+\dot{\mathcal{X}}*_{m} \mathcal{B}(t)=-\gamma _{1}(\mathcal{A}(t)*_{n}\mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t))-\gamma _{2}\mathcal{Y}(t) \\ \hphantom{\mathcal{A}(t)*_{n}\dot{\mathcal{X}}+\dot{\mathcal{X}}*_{m} \mathcal{B}(t)}\quad {} -\dot{\mathcal{A}}(t)*_{n}\mathcal{X}-\mathcal{X}*_{m} \dot{\mathcal{B}}(t)+\dot{\mathcal{C}}(t)+\bar{\eta }, \\ \dot{\mathcal{Y}}(t)=\mathcal{A}(t)*_{n}\mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t). \end{cases} $$
(19)

(3) Setting \(e(t)=\mathcal{E}(t)\) in the following improved Zhang design formula:

$$ \dot{e}(t)=-\gamma _{1} F\bigl(e(t) \bigr)-\gamma _{2}G \biggl(e(t)+\gamma _{1} \int _{0}^{t}F \bigl(e(s) \bigr)\,ds \biggr)+ \bar{\eta }, \quad \gamma _{1}\geq 0, \gamma _{2}>0, $$
(20)

where \(F(\cdot )\), \(G(\cdot )\) are two activation-function arrays, we get the third noise-tolerant CTZNN (NTCTZNN3) for TVSTEs:

$$ \begin{aligned}[b] &\mathcal{A}(t)*_{n} \dot{\mathcal{X}}+\dot{\mathcal{X}}*_{m} \mathcal{B}(t) \\ &\quad =-\gamma _{1}F\bigl(\mathcal{A}(t)*_{n} \mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t)\bigr) \\ &\quad\quad{} -\gamma _{2}G \biggl(\mathcal{A}(t)*_{n} \mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t) \\ &\quad\quad{}+\gamma _{1} \int _{0}^{t}F \bigl(\mathcal{A}(s)*_{n} \mathcal{X}+\mathcal{X}*_{m}\mathcal{B}(s)-\mathcal{C}(s) \bigr)\,ds \biggr) \\ &\quad\quad{} -\dot{\mathcal{A}}(t)*_{n}\mathcal{X}+ \mathcal{X}*_{m} \dot{\mathcal{B}}(t)+\dot{\mathcal{C}}(t)+\bar{\eta }. \end{aligned} $$
(21)

Obviously, NTCTZNN3 (21) can be written as

$$ \dot{\mathcal{E}}(t)=-\gamma _{1}F\bigl(\mathcal{E}(t)\bigr)-\gamma _{2}G \biggl( \mathcal{E}(t)+\gamma _{1} \int _{0}^{t}F \bigl(\mathcal{E}(s) \bigr)\,ds \biggr)+\bar{\eta }, $$

which can be decoupled into IK differential equations:

$$ \begin{aligned}[b] \dot{e}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)&=-\gamma _{1}f\bigl(e_{i_{1} \cdots i_{n}k_{1}\cdots k_{m}}(t)\bigr) \\ &\quad{} -\gamma _{2}g \biggl(e_{i_{1}\cdots i _{n}k_{1}\cdots k_{m}}(t)+\gamma _{1} \int _{0}^{t}f \bigl(e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s) \bigr)\,ds \biggr)+\eta .\end{aligned} $$
(22)

Define an intermediate variable \(y_{i_{1}\cdots i_{n}k_{1}\cdots k _{m}}(t)\) as follows:

$$ y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=e_{i_{1}\cdots i_{n}k_{1} \cdots k_{m}}(t)+\gamma _{1} \int _{0}^{t}f \bigl(e_{i_{1}\cdots i_{n}k _{1}\cdots k_{m}}(s) \bigr)\,ds. $$
(23)

To establish the convergence of NTCTZNN3 (21), we make the following assumption about the constant noise η̄.

Assumption 3.1

The constant noise \(\bar{\eta }=(\eta ) \in \mathbb{R}^{I_{1}\times \cdots \times I_{n}\times K_{1}\times \cdots \times K_{m}}\) satisfies

$$ \bigl\vert \eta y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \bigr\vert \leq \alpha \gamma y _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)g\bigl(y_{i_{1}\cdots i_{n}k_{1} \cdots k_{m}}(t)\bigr), \quad t\geq 0, $$

where \(\alpha \in (0,1)\), and \(g(\cdot )\) is the entry of the activation function \(G(\cdot )\).

Theorem 3.3

Under Assumption 3.1, NTCTZNN3 (21) converges to the theoretical solution of TVSTEs with the limit of the residual error being \(\sqrt{IK}g^{-1}(\eta /\gamma )\).

Proof

Taking the time-derivative of (23) on both sides, one has

$$ \dot{y}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=\dot{e}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)+ \gamma _{1}f \bigl(e_{i_{1}\cdots i_{n}k_{1} \cdots k_{m}}(t) \bigr). $$
(24)

Then substituting (24) into (22), we have

$$ \dot{y}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=-\gamma _{2}g (y _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}} )+\eta . $$
(25)

For the differential equation (25), let us define a Lyapunov function candidate

$$ v_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=y^{2}_{i_{1}\cdots i_{n}k _{1}\cdots k_{m}}(t)/2\geq 0, $$

whose time-derivative is

$$\begin{aligned}& \dot{v}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \\& \quad =y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\dot{y}_{i_{1}\cdots i _{n}k_{1}\cdots k_{m}}(t) \\& \quad =-\gamma y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)g\bigl(y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\bigr)+\eta e_{i_{1}\cdots i_{n}k_{1}\cdots k _{m}}(t) \\& \quad \leq -(1-\alpha )\gamma y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)g\bigl(y _{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \bigr). \end{aligned}$$

Since \(g(\cdot )\) is an odd and monotone increasing function, we have \(y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\times g(y_{i_{1}\cdots i_{n}k_{1} \cdots k_{m}}(t))>0\) for \(y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \neq 0\), and \(y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)g(y_{i_{1} \cdots i_{n}k_{1}\cdots k_{m}}(t))=0\) if and only if \(y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)=0\). This indicates the negative definiteness of \(\dot{v}_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\). Then, by Lyapunov stability theory, equilibrium \(g^{-1}(\eta /\gamma _{2})\) of (25) is globally asymptotically stable. Then, by the definition of \(y_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)\) in (23), one has

$$ \lim_{t\rightarrow \infty } \biggl(e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t)+ \gamma _{1} \int _{0}^{t}f \bigl(e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s) \bigr)\,ds \biggr)=g^{-1}(\eta /\gamma _{2}). $$
(26)

Since \(f(\cdot )\) is also an odd and monotone increasing function, one has

$$ \bigl\vert e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \bigr\vert \leq \biggl\vert e_{i_{1}\cdots i _{n}k_{1}\cdots k_{m}}(t)+\gamma _{1} \int _{0}^{t}f \bigl(e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(s) \bigr)\,ds \biggr\vert . $$

This and (26), \(\eta \geq 0\), \(\gamma _{1},\gamma _{2}>0\) imply that

$$ \lim_{t\rightarrow \infty } \bigl\vert e_{i_{1}\cdots i_{n}k_{1}\cdots k_{m}}(t) \bigr\vert \leq g^{-1}(\eta /\gamma _{2}). $$

So

$$ \lim_{t\rightarrow \infty } \bigl\Vert \mathcal{E}(t) \bigr\Vert \leq \sqrt{IK}g^{-1}( \eta /\gamma ). $$

This completes the proof. □

Remark 3.4

Obviously, when \(F(\cdot )\) and \(G(\cdot )\) are identity mappings, NTCTZNN3 (21) reduces to NTCTZNN2 (16).

At the end of this section, we develop a gradient-based neural network (GNN) to solve TVSTEs for comparison. Define a scalar-valued norm-based energy function:

$$ \xi (\mathcal{X})= \bigl\Vert \mathcal{A}(t)*_{n}\mathcal{X}+ \mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t) \bigr\Vert ^{2}/2. $$

Then, based on Theorem 3.1 in [13], the gradient of the energy function \(\xi (\mathcal{X})\) is

$$ \frac{\partial \xi (\mathcal{X})}{\partial \mathcal{X}}=\mathcal{A} ^{\top }(t)*_{n} \mathcal{E}(t)+{\varepsilon }(t)*_{m} \mathcal{B}^{\top }(t), $$

where \(\mathcal{E}(t)=\mathcal{A}(t)*_{n}\mathcal{X}+\mathcal{X}*_{m} \mathcal{B}(t)-\mathcal{C}(t)\). Thus, we get the following GNN:

$$ \dot{\mathcal{X}}=-\gamma \bigl(\mathcal{A}^{\top }(t)*_{n} \mathcal{E}(t)+ \mathcal{E}(t)*_{m}\mathcal{B}^{\top }(t) \bigr)+\bar{\eta }, $$
(27)

where \(\gamma >0\) is a design parameter.

Remark 3.5

The most difference between GNN (27) and NTCTZNN1-3 lies in the former do not make use of the time-derivative information \(\dot{\mathcal{A}}(t)\), \(\dot{\mathcal{B}}(t)\) and \(\dot{\mathcal{C}}(t)\) to enhance its efficiency.

In the following, the following theorem establishes the convergence of GNN (27) for static Sylvester tensor equations, i.e., \(\dot{\mathcal{A}}=\mathcal{O}\), \(\dot{\mathcal{B}}=\mathcal{O}\), \(\dot{\mathcal{C}}=\mathcal{O}\), whose proof is motivated by Theorems 1 and 2 in [38].

Theorem 3.4

As for the convergence of GNN (27), we have the following conclusions:

  1. (1)

    GNN (27) exponentially converges to the theoretical solution of static Sylvester tensor equations without noise;

  2. (2)

    the computational error of GNN (27) for static Sylvester tensor equations with noise is upper bounded. Furthermore, if the design parameterγtends to positive infinite, the steady-state solution-error diminishes to zero.

Proof

Set \(\varPhi _{II}(\mathcal{A})=A\), \(\varPhi _{IK}(\mathcal{X}(t))=X(t)\), \(\varPhi _{KK}(\mathcal{B})=B\), \(\varPhi _{IK}(\mathcal{C})=C\), \(\varPhi _{IK}( \mathcal{E})=E\), \(\varPhi _{IK}(\bar{\eta })=\theta \). Then TVSTEs can be written as

$$ E=AX+XB-C, $$

i.e.,

$$ \operatorname{vec}(E)=\bigl(I_{K}\otimes A+B^{\top } \otimes I_{I}\bigr)\operatorname{vec}\bigl(X(t)\bigr)- \operatorname{vec}(C)=\bigl(A\oplus B^{\top }\bigr)\operatorname{vec} \bigl(X(t)\bigr)-\operatorname{vec}(C). $$

So GNN (27) can be vectorized as

$$\begin{aligned} \operatorname{vec}\bigl(\dot{X}(t)\bigr) =&-\gamma \bigl(I_{K} \otimes A^{\top }+B\otimes I _{I}\bigr) \bigl(\bigl(A\oplus B^{\top }\bigr)\operatorname{vec}\bigl(X(t)\bigr)-\operatorname{vec}(C) \bigr)+ \operatorname{vec}(\theta ) \\ =&-\gamma \bigl(A^{\top }\oplus B\bigr) \bigl(\bigl(A\oplus B^{\top }\bigr)\operatorname{vec}\bigl(X(t)\bigr)- \operatorname{vec}(C) \bigr)+\operatorname{vec}(\theta ). \end{aligned}$$

Assume \(\mathcal{X}^{*}\) is one solution of TVSTEs, and set \(\varPhi _{IK}(\mathcal{X}^{*})=X^{*}\). Then

$$ \bigl(A\oplus B^{\top }\bigr)\operatorname{vec}\bigl(X^{*}(t) \bigr)=\operatorname{vec}(C). $$

From the above equations, one has

$$\begin{aligned}& \operatorname{vec}\bigl(\dot{X}(t)\bigr)=-\gamma \bigl(A^{\top } \oplus B\bigr) \bigl(A\oplus B^{ \top }\bigr) \bigl(\operatorname{vec} \bigl(X(t)\bigr)-\operatorname{vec}\bigl(X^{*}\bigr) \bigr)+ \operatorname{vec}( \theta ), \end{aligned}$$

i.e.,

$$ \operatorname{vec}\bigl(\dot{X}(t)\bigr)-\operatorname{vec}\bigl( \dot{X}^{*}\bigr)=-\gamma M^{\top }M \bigl( \operatorname{vec}\bigl(X(t)\bigr)-\operatorname{vec}\bigl(X^{*} \bigr) \bigr)+\operatorname{vec}(\theta ), $$

where \(M=A\oplus B^{\top }\). Setting \(y(t)=\operatorname{vec}(X(t))- \operatorname{vec}(X^{*})\), we have

$$ \dot{y}(t)=-\gamma M^{\top }My(t)+\operatorname{vec}(\theta ). $$

Suppose α is the minimum eigenvalue of \(M^{\top }M\), which is assumed to be positive definite.

(1) For the static Sylvester tensor equations without noise,

$$ \dot{v}\bigl(y(t),t\bigr)\leq -\gamma \alpha y^{\top }(t)y(t)=-\gamma \alpha v\bigl(y(t),t\bigr). $$

Thus,

$$ v\bigl(y(t),t\bigr)=\frac{ \Vert \operatorname{vec}(X(t))-\operatorname{vec}(X^{*}) \Vert ^{2}}{2} \leq v\bigl(y(0),0\bigr)\exp(-\alpha \gamma t), $$

which implies that the neural state \(\mathcal{X}(t)\) converges to the theoretical solution \(\mathcal{X}^{*}\) with the exponential rate \(\alpha \gamma /2\).

(2) For the static Sylvester tensor equations with noise,

$$\begin{aligned} \dot{v}\bigl(y(t),t\bigr) \leq& -\gamma \alpha \bigl\Vert y(t) \bigr\Vert ^{2}+y^{\top }(t) \operatorname{vec}(\theta ) \\ \leq &-\sum_{i} \bigl\vert y_{i}(t) \bigr\vert \bigl(\alpha \gamma \bigl\vert y_{i}(t) \bigr\vert - \bigl\vert \bigl(\operatorname{vec}( \theta )\bigr)_{i} \bigr\vert \bigr). \end{aligned}$$

(i) If \(\alpha \gamma \vert y_{i}(t) \vert - \vert (\operatorname{vec}(\theta ))_{i} \vert \geq 0\), \(\forall i, t\), then according to the Lyapunov stability theory [39], the time-varying vector \(y(t)\) converges towards to zero as time evolves. Thus \(\mathcal{X}(t)\) converges to the theoretical solution \(\mathcal{X}^{*}(t)\).

(ii) If \(\alpha \gamma \vert y_{i}(t) \vert - \vert (\operatorname{vec}(\theta ))_{i} \vert \leq 0\), \(\exists i, t\), then the function \(v(y(t),t)\) maybe increasing. However, from the inequality \(\alpha \gamma \vert y_{i}(t) \vert - \vert (\operatorname{vec}(\theta ))_{i} \vert \leq 0\), it is easy to deduce that

$$ \bigl\vert y_{i}(t) \bigr\vert \leq \frac{ \vert (\operatorname{vec}(\theta ))_{i} \vert }{\alpha \gamma }. $$

Overall, one has

$$ \bigl\Vert y(t) \bigr\Vert \leq \frac{\sqrt{IK} \Vert \bar{\eta } \Vert }{\alpha \gamma }. $$

This completes the proof. □

Numerical results

In this section, two examples are presented to show the efficiency of the proposed NTCTZNN1-3 and GNN for solving TVSTEs. All experiments are performed on a Thinkpad laptop with Intel Core 2 CPU 2.10 GHZ and RAM 4.00 GM and written in Matlab R2014a. All first-order ordinary equations encountered are solved by the built-in function ode45.

Example 4.1

Consider the time-varying Sylvester tensor equations (\(m=n=2\), \(I_{1}=I_{2}=K_{1}=K_{2}=2\)):

$$ \mathcal{A}(t)*_{n}\mathcal{X}+ \mathcal{X}*_{m}\mathcal{B}(t)= \mathcal{C}(t), \quad t\in [0,10], $$
(28)

where

$$\begin{aligned}& \mathcal{A}(:,:,1,1)= \begin{bmatrix} 2+\cos (2t)&0 \\ 0&0 \end{bmatrix} , \quad\quad \mathcal{A}(:,:,1,2)= \begin{bmatrix} 0&1-\sin (2t) \\ 0&0 \end{bmatrix} , \\& \mathcal{A}(:,:,2,1)= \begin{bmatrix} 0&0 \\ 2-\cos (2t)&0 \end{bmatrix} , \quad\quad \mathcal{A}(:,:,2,2)= \begin{bmatrix} 0&0 \\ 0&1+\sin (2t) \end{bmatrix} , \\& \mathcal{B}(:,:,1,1)= \begin{bmatrix} 2-\sin (2t)&0 \\ 0&0 \end{bmatrix} , \quad\quad \mathcal{B}(:,:,1,2)= \begin{bmatrix} 0&1 \\ 0&0 \end{bmatrix} , \\& \mathcal{B}(:,:,2,1)= \begin{bmatrix} 0&0 \\ 1+\cos (2t)&0 \end{bmatrix} ,\quad\quad \mathcal{B}(:,:,2,2)= \begin{bmatrix} 0&0 \\ 0&1 \end{bmatrix} , \\& \mathcal{C}=\mathcal{A}. \end{aligned}$$

We set the noise \(\eta =1\) and the initial state tensor \(\mathcal{X} _{0}=\mathcal{I}\).

Firstly, we use the GNN with \(\gamma =1000\) or \(\gamma =1000\) to solve problem (28), and the generated residual errors \(\Vert \mathcal{E}(t) \Vert \) are shown in Fig. 1. The final residual errors generated by the GNN with \(\gamma =1000\) and the GNN with \(\gamma =1000\) are 0.0047 and 0.0024, respectively. Furthermore, the number of iterations of the GNN with \(\gamma =1000\) and the GNN with \(\gamma =1000\) are 136,041 and 272,077, respectively.

Figure 1
figure1

The trajectory of the residual error \(\Vert \mathcal{E}(t) \Vert \) synthesized by GNN

Secondly, we use NTCTZNN1 to solve problem (28). Figure 2 shows the residual error \(\Vert \mathcal{E}(t) \Vert \) generated by NTCTZNN1, and the final residual errors generated by NTCTZNN1 with \(\gamma =500\) and NTCTZNN1 with \(\gamma =1000\) are 0.0080 and 0.0040, respectively. From Fig. 2 we can see that the performance of NTCTZNN1 with \(\gamma =1000\) is better than that of NTCTZNN1 with \(\gamma =500\), which is in accordance with Theorem 3.1. In addition, the number of iterations of the NTCTZNN1 with \(\gamma =1000\) and the NTCTZNN1 with \(\gamma =1000\) are 6541 and 12,561, respectively, which are only about 4% of those generated by GNN. Therefore, NTCTZNN1 computational cost is reduced greatly though it is a little less accurate than GNN.

Figure 2
figure2

The trajectory of the residual error \(\Vert \mathcal{E}(t) \Vert \) synthesized by NTCTZNN1

Thirdly, we use NTCTZNN2 to solve problem (28). The residual error \(\Vert \mathcal{E}(t) \Vert \) generated by NTCTZNN2 with \(\gamma _{1}= \gamma _{2}=10\) is displayed in Fig. 3, and the final residual error is \(1.5972\times 10^{-4}\). In addition, the number of iterations of the NTCTZNN2 with \(\gamma =10\) is 373, which is much less than that of NTCTZNN1.

Figure 3
figure3

The trajectory of the residual error \(\Vert \mathcal{E}(t) \Vert \) synthesized by NTCTZNN2

Generally speaking, large design parameter can enhance the efficiency of NTCTZNN. However, we also observed that NTCTZNN with large design parameter often takes more CPU time because the larger the design parameter is, the smaller the step-size of ode45 is. In fact, the CPU times of NTCTZNN2 with \(\gamma _{1}=\gamma _{2}=10\) and NTCTZNN2 with \(\gamma _{1}=\gamma _{2}=100\) are 1.4219 and 1.8594, respectively. Comparing Figs. 2 and 3, we find that: (1) NTCTZNN2 with \(\gamma _{1}= \gamma _{2}=10\) is more efficient than NTCTZNN1 with \(\gamma =1000\), because the final residual error generated by NTCTZNN2 with \(\gamma _{1}=\gamma _{2}=10\) is about 10−4, while the final residual error generated by NTCTZNN1 with \(\gamma =1000\) is about 10−3; (2) the final residual error generated by NTCTZNN1 becomes stable quickly, but the final residual error generated by NTCTZNN2 is shrinking during the tested time period \([0,10]\), which is in accordance with Theorem 3.2. Overall, NTCTZNN2 can get more accurate solution but not increase the computational cost. The neural states \(\mathcal{X}(t)(1,1,1,1)\), \(\mathcal{X}(t)(1,2,1,2)\), \(\mathcal{X}(t)(2,1,2,1)\), \(\mathcal{X}(t)(2,2,2,2)\) computed by NTCTZNN2 are plotted in Fig. 4, which shows that the neural states converge to the corresponding entries of the theoretical solution. (Here the theoretical solution \(\mathcal{X}^{*}(t)\) is denoted by -dotted blue curves, and the neural-network solutions are denoted by +-dotted red curves.)

Figure 4
figure4

The trajectory of some neural states synthesized by NTCTZNN2

Fourthly, we use NTCTZNN3 with \(\gamma _{1}=\gamma _{2}=10\) to solve problem (28), and the activation function is set as the sign-bi-power activation function [37]:

$$ f(x)=g(x)=k \biggl(\frac{1}{2}\operatorname{sig}^{\tilde{\eta }}(x)+ \frac{1}{2}\operatorname{sig}^{1/\tilde{\eta }}(x) \biggr), \quad k>0, $$

where \(\tilde{\eta }\in (0,1)\) and \(\operatorname{sig}^{\tilde{\eta }}(x)\) is defined as

$$ \operatorname{sig}^{\tilde{\eta }}(x)= \textstyle\begin{cases} \vert x \vert ^{\tilde{\eta }}, & \text{if } x>0, \\ 0, & \text{if } x=0, \\ - \vert x \vert ^{\tilde{\eta }}, & \text{if } x< 0. \end{cases}\displaystyle \quad = \operatorname{sign}(x) \vert x \vert ^{\tilde{\eta }}. $$

We set \(\tilde{\eta }=0.5\), \(k=1\). The residual error \(\Vert \mathcal{E}(t) \Vert \) generated by NTCTZNN3 with \(\gamma _{1}=\gamma _{2}=2\) is displayed in Fig. 5, and the final residual error is \(9.8868\times 10^{-5}\). Furthermore, its number of iterations is 1097. Comparing the accuracy and the number of iterations, we can find that NTCTZNN2 is more efficient than NTCTZNN3.

Figure 5
figure5

The trajectory of the residual error \(\Vert \mathcal{E}(t) \Vert \) synthesized by NTCTZNN3

Overall, taking the two criteria, i.e., accuracy and number of iterations, into consideration, we can rank the proposed four neural-networks as follows:

$$ \mathrm{NTCTZNN2}\succ \mathrm{NTCTZNN3}\succ \mathrm{NTCTZNN1}\succ \mathrm{GNN}, $$

in which \(a\succ b\) means that a is more efficient than b.

NTCTZNN1-3 and GNN are customized to solve TVSTEs, and they can also be exploited to solve static Sylvester tensor equations, which can be viewed as a special case of TVSTEs (i.e., \(\dot{\mathcal{A}}= \dot{\mathcal{B}}=\dot{\mathcal{C}}=0\)). For simplicity, we only apply NTCTZNN3 to solve the static Sylvester tensor equations, which is given in the following example.

Example 4.2

Let us consider a static Sylvester tensor equation with (\(m=n=2\), \(I_{1}=I_{2}=K_{1}=K_{2}=2\)):

$$ \mathcal{A}*_{n}\mathcal{X}+ \mathcal{X}*_{m}\mathcal{B}=\mathcal{C}, $$
(29)

where

$$\begin{aligned}& \mathcal{A}(:,:,1,1)= \begin{bmatrix} 1&1 \\ 2&-4 \end{bmatrix} , \quad\quad \mathcal{A}(:,:,1,2)= \begin{bmatrix} 1&1 \\ -1&1 \end{bmatrix} , \\& \mathcal{A}(:,:,2,1)= \begin{bmatrix} 3&10 \\ -12&-8 \end{bmatrix} ,\quad\quad \mathcal{A}(:,:,2,2)= \begin{bmatrix} 6&9 \\ 3&10 \end{bmatrix} , \\& \mathcal{B}(:,:,1,1)= \begin{bmatrix} 2&-1 \\ -2&5 \end{bmatrix} , \quad\quad \mathcal{B}(:,:,1,2)= \begin{bmatrix} 5&3 \\ -1&-3 \end{bmatrix} , \\& \mathcal{B}(:,:,2,1)= \begin{bmatrix} 10&3 \\ 1&2 \end{bmatrix} , \quad\quad \mathcal{B}(:,:,2,2)= \begin{bmatrix} 2&5 \\ 2&10 \end{bmatrix} , \\& \mathcal{C}=\mathcal{A}. \end{aligned}$$

We set the noise \(\eta =1\) and the initial state tensor \(\mathcal{X} _{0}=\mathcal{I}\). The numerical results generated by NTCTZNN3 with \(\gamma _{1}=\gamma _{2}=10\) are plotted in Fig. 6, and the final residual error is \(2.1473\times 10^{-4}\).

Figure 6
figure6

Convergence of NTCTZNN3 for the solution of problem (29)

As shown in Fig. 6, the neural state computed by NTCTZNN3 converges to the theoretical solution. In addition, the sequence \(\{ \Vert \mathcal{E}(t) \Vert \}\) of the residual errors converges to zero. These demonstrate the effectiveness of NTCTZNN3 for the static Sylvester tensor equations. Furthermore, the convergence property of GBB also needs to be investigated.

Conclusion

By following Zhang et al.’s design method, we have proposed three noise-tolerant continuous-time Zhang neural networks (NTCTZNNs) and a gradient-based neural network (GNN) to solve the time-varying Sylvester tensor equations with noise, and have established their various convergence results. These complement some existing results. Numerical results substantiate the efficacy and superiority of the proposed NTCTZNNs.

It is possible to extend the ideas in this paper for other type tensor equations, such as time-varying periodic Sylvester tensor equations, or time-varying coupled Sylvester tensor equations. Furthermore, we will apply the designed neural networks to realize the path-tracking control of different robot manipulators in the future.

References

  1. 1.

    Einstein, A.: The foundation of the general theory of relativity. In: Kox, A.J., Klein, M.J., Schulmann, R. (eds.) The Collected Papers of Albert Einstein, vol. 6. Princeton University Press, Princeton (2007)

  2. 2.

    Haussühl, S.: Physical Properties of Crystals. Wiley, Weinheim (2007)

  3. 3.

    Brazell, M., Li, N., Navasca, C., Tamon, C.: Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 34, 542–570 (2013)

  4. 4.

    Zhang, Y.N., Jiang, D.C., Wang, J.: A recurrent neural network for solving Sylvester equation with time-varying coefficients. IEEE Trans. Neural Netw. 13(5), 1053–1063 (2002)

  5. 5.

    Zhou, B., Duam, G.R., Lin, Z.: A parametric periodic Lyapunov equation with application in semi-global stabilization of discrete-time periodic systems subject to actuator saturation. Automatica 47, 316–325 (2011)

  6. 6.

    Wang, Q.W., He, Z.H.: Systems of coupled generalized Sylvester matrix equations. Automatica 50(11), 2840–2844 (2014)

  7. 7.

    Sun, M., Wang, Y.J.: The conjugate gradient methods for solving the generalized periodic Sylvester matrix equations. J. Appl. Math. Comput. 60, 413–434 (2019)

  8. 8.

    Sun, L.Z., Zheng, B.D., Bu, C.J., Wei, Y.M.: Moore–Penrose inverse of tensors via Einstein product. Linear Multilinear Algebra 64, 686–698 (2016)

  9. 9.

    Li, B.W., Sun, Y.S., Zhang, D.W.: Chebyshev collocation spectral methods for coupled radiation and conduction in a concentric spherical participating medium. J. Heat Transf. 131, 062701 (2009)

  10. 10.

    Li, B.W., Tian, S., Sun, Y.S., Mao, Z.: Schur-decomposition for 3D matrix equations and its application in solving radiative discrete ordinates equations discretized by Chebyshev collocation spectral method. J. Comput. Phys. 229, 1198–1212 (2010)

  11. 11.

    Ding, F., Chen, T.W.: Gradient based iterative algorithms for solving a class of matrix equations. IEEE Trans. Autom. Control 50(8), 1216–1221 (2005)

  12. 12.

    Wang, Q.W., Xu, X.J.: Iterative algorithms for solving some tensor equations. Linear Multilinear Algebra 67, 1325–1349 (2019)

  13. 13.

    Huang, B.H., Ma, C.F.: An iterative algorithm to solve the generalized Sylvester tensor equations. Linear Multilinear Algebra (2018). https://doi.org/10.1080/03081087.2018.1536732

  14. 14.

    Lv, L.L., Zhang, Z., Zhang, L., Wang, W.S.: An iterative algorithm for periodic Sylvester matrix equations. J. Ind. Manag. Optim. 14(1), 413–425 (2018)

  15. 15.

    Hajarian, M.: Generalized conjugate direction algorithm for solving the general coupled matrix equations over symmetric matrices. Numer. Algorithms 73(3), 591–609 (2016)

  16. 16.

    Hajarian, M.: New finite algorithm for solving the generalized nonhomogeneous Yakubovich-transpose matrix equation. Asian J. Control 19(1), 164–172 (2017)

  17. 17.

    Hajarian, M.: Computing symmetric solutions of general Sylvester matrix equations via Lanczos version of biconjugate residual algorithm. Comput. Math. Appl. 76(4), 686–700 (2018)

  18. 18.

    Lv, L.L., Zhang, Z.: On the periodic Sylvester equations and their applications in periodic Luenberger observers design. J. Franklin Inst. 353(5), 1005–1018 (2016)

  19. 19.

    Lv, L.L., Zhang, Z.: Finite iterative solutions to periodic Sylvester matrix equations. J. Franklin Inst. 354(5), 2358–2370 (2017)

  20. 20.

    Lv, L.L., Zhang, Z.: A parametric poles assignment algorithm for second-order linear periodic systems. J. Franklin Inst. 354(8), 8057–8071 (2017)

  21. 21.

    Lv, L.L., Zhang, Z., Zhang, L.: A periodic observers synthesis approach for LDP systems based oniteration. IEEE Access 6, 8539–8546 (2018)

  22. 22.

    Lv, L.L., Zhang, Z., Zhang, L., Liu, X.X.: Gradient based approach for generalized discrete-time periodic coupled Sylvester matrix equations. J. Franklin Inst. 355(15), 7691–7705 (2018)

  23. 23.

    Hajarian, M.: Matrix iterative methods for solving the Sylvester-transpose and periodic Sylvester matrix equations. J. Franklin Inst. 350(10), 3328–3341 (2013)

  24. 24.

    Hajarian, M.: Generalized reflexive and anti-reflexive solutions of the coupled Sylvester matrix equations via CD algorithm. J. Vib. Control 24(2), 343–356 (2016)

  25. 25.

    Hajarian, M.: Least squares solution of the linear operator equation. J. Optim. Theory Appl. 170(1), 205–219 (2016)

  26. 26.

    Hajarian, M.: Solving the general Sylvester discrete-time periodic matrix equations via the gradient based iterative method. Appl. Math. Lett. 52, 87–95 (2016)

  27. 27.

    Sun, M., Wang, Y.J., Liu, J.: Two modified least-squares iterative algorithms for the Lyapunov matrix equations. Adv. Differ. Equ. 2019, 305 (2019)

  28. 28.

    Guo, D.S., Lin, X.J., Su, Z.Z., Sun, S.B., Huang, Z.J.: Design and analysis of two discrete-time ZD algorithms for time-varying nonlinear minimization. Numer. Algorithms 77(1), 23–36 (2018)

  29. 29.

    Sun, M., Tian, M.Y., Wang, Y.J.: Discrete-time Zhang neural networks for time-varying nonlinear optimization. Discrete Dyn. Nat. Soc. 4745759, 1–14 (2019)

  30. 30.

    Sun, M., Wang, Y.J.: General five-step discrete-time Zhang neural network for time-varying nonlinear optimization. Bull. Malays. Math. Soc. (2019). https://doi.org/10.1007/s40840-019-00770-4

  31. 31.

    Zhang, Y.N., Li, Z.: Zhang neural network for online solution of time-varying convex quadratic program subject to time-varying linear-equality constraints. Phys. Lett. A 373, 1639–1643 (2009)

  32. 32.

    Jin, L., Zhang, Y.N.: Discrete-time Zhang neural network of \(\mathcal{O}(\tau ^{3})\) pattern for time-varying matrix pseudoinversion with application to manipulator motion generation. Neurocomputing 142, 165–173 (2014)

  33. 33.

    Sun, M., Liu, J.: General six-step discrete-time Zhang neural network for time-varying tensor absolute value equations. Discrete Dyn. Nat. Soc. (2019). Accepted

  34. 34.

    Brazell, M., Li, N., Navasca, C., Tamon, C.: Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 34, 542–570 (2013)

  35. 35.

    Guo, D.S., Zhang, Y.N.: Zhang neural network versus gradient-based neural network for time-varying linear matrix equation solving. Neurocomputing 74, 3708–3712 (2011)

  36. 36.

    Jin, L., Zhang, Y.N.: Integration-enhanced Zhang neural network for real-time-varying matrix inversion in the presence of various kinds of noises. IEEE Trans. Neural Netw. Learn. Syst. 27(12), 2615–2627 (2016)

  37. 37.

    Li, S., Chen, S., Liu, B.: Accelerating a recurrent neural network to finite-time convergence for solving time-varying Sylvester equation by using a sign-bi-power activation function. Neural Process. Lett. 37, 1–17 (2015)

  38. 38.

    Yi, C.F., Chen, Y.H., Lu, Z.H.: Improved gradient-based neural networks for online solution of Lyapunov matrix equation. Inf. Process. Lett. 111, 780–786 (2011)

  39. 39.

    Ge, S.S., Lee, T.H., Harris, C.J.: Adaptive Neural Network Control of Robotic Manipulators. World Scientific, London (1998)

Download references

Acknowledgements

The authors thank two anonymous reviewers for their valuable comments and suggestions, which have helped them in improving the paper.

Funding

This work is supported by the National Natural Science Foundation of China and Shandong Province (No. 11671228, 11601475, ZR2016AL05), the Doctoral Initiation Fund of Zaozhuang University and the Qingtan Scholar Project of Zaozhang University.

Author information

The first author provided the problem and gave the proof of the main results, and the second author finished the numerical experiment. All authors read and approved the final manuscript.

Correspondence to Sun Min.

Ethics declarations

Competing interests

The authors declare that there are no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Min, S., Jing, L. Noise-tolerant continuous-time Zhang neural networks for time-varying Sylvester tensor equations. Adv Differ Equ 2019, 465 (2019) doi:10.1186/s13662-019-2406-8

Download citation

Keywords

  • Time-varying Sylvester tensor equations
  • Noise-tolerant continuous-time Zhang neural network
  • Gradient-based neural network
  • Global convergence