A parallel algorithm for space-time-fractional partial differential equations

This paper is dedicated to the derivation of a simple parallel in space and time algorithm for space and time fractional evolution partial differential equations. We report the stability, the order of the method and provide some illustrating numerical experiments.


Introduction
This paper is devoted to the derivation of a parallel algorithm for solving fractional in space and time fractional partial differential equations. We will first focus on the parallelization in-time using a parareal algorithm, which is the most standard parallel-in-time algorithm for solving differential equations [20,29]. In this goal, we focus in the first part of the paper on the derivation and analysis of a parallel-in-time algorithm for fractional ordinary differential equations (FODEs) [40]. More specifically, a parareal-Gorenflo's scheme [40] is derived and analyzed for FODEs. We notice that the analysis of convergence of the parareal method for solving ODEs remains partially valid for fractional equations [22,32]. Then, we provide a parallel in space and time algorithm for fractional in space and time partial differential equations. Basically, the spatial approximation is provided thanks to (i) a Fourier-based pseudospectral method, which will benefit from the scalability of standard parallel FFT algorithms [18] and (ii) an efficient combination with the parallel-in-time algorithm.
A key element of the proposed overall algorithm is the parareal method. Let us recall its principle for solving differential equations. It consists in 1. first solving sequentially the differential equation over [0; T] using a large time-step T = T/N for some N ∈ N * , that is, on a coarse grid in time; 2. solving (using the same or more accurate scheme as the one used on the coarse mesh) in parallel on say q processors (typically such that N ∈ qN * for an efficient workload) with a time-step t = T/R for some R 1, the equation on q subdomains {[T qi ; T q(i+1) ]} 0≤i≤R-1 , where T n = n T for 0 ≤ n ≤ N , T N = T, and with initial conditions (at {T n } 0≤n≤N-1 ) on each subdomain, given by the solution initially computed on the coarse grid (Step 1). The parareal method allows for an accurate parallel computation of ODEs, and we refer to [16,21,29] for details about this celebrated method for solving ordinary/partial differential equations. It is in particular successfully combined with traditional domain decomposition methods in space. Indeed it allows us to use a very large number of processors going beyond the usual limits of efficiency of domain decomposition methods with a too large number of spatial subdomains. In this paper, we are interested in the extension of the parareal method to FODEs, which have become very popular over this past decade. The recent progress in fractional differential equation solvers [6,7,25,27,30,33,39] is largely motivated by the development of fractional differential models in physics, mechanics, epidemiology, and applied probability allowing to take into account nonlocal in space or time effects [9-11, 15, 26, 30, 35, 37]. The main purpose of this paper is then to study efficient parallel algorithms for fractional ordinary and partial differential equations. In particular, we propose an original algorithm combining a parallelization in space and time. Very few works actually exist on parallel-in-time methods for FODEs; however, let us cite [38] where a parareal method along with collocation and Fourier-based FODE solvers is developed. At this stage, we do not consider realist models from the literature, but focus on toy-scalar linear equations, for which it is possible to provide a relatively precise analysis and to exhibit the strong convergence and efficiency properties. In this paper, we first propose a parareal version of standard Gorenflo's scheme for approximating FODE [40]. In this goal, we consider for 0 < α < 1, T > 0, and where f is a non-highly oscillatory Lipschitz continuous function. Lipschitz continuity allows for existence of a unique solution, while the non-highly oscillatory condition will allow for an efficient and accurate use standard quadratures. In particular the existence of unique solutions can be proved in some weighted subsets of C α , see [19] for details. The analysis will be presented for the parareal-Gorenflo scheme approximating a linear FODE: where λ > 0, 0 < α < 1, and g ∈ C 0 ([0; T]). The operator D α t = C D α t is here defined as a Caputo's derivative [40], that is, where the special gamma function Γ , for Re(z) > 0, is defined as and Γ (p) = (p -1)! for p ∈ N * . We also recall the definition of Caputo's derivative C I α t y(t) Hence, (1) and (say) for y(0) = 0, and y solution to (2) we have The well-posedness (existence and uniqueness) of this equation is analyzed in [14] for Lipschitz functions f . Let us also refer to [28,31,34] for other types of fractional differential equations. The main difficulty in parallelizing a FODE comes from the fact that the fractional derivative is a nonlocal integro-differential operator. As a consequence, it is no more possible to directly and efficiently compute in parallel the differential equation on fine grids, as usually proposed in the parareal method. In this paper, we propose a natural strategy which consists in computing in parallel the fractional integrals by decomposing them into a local (containing the "singularity" and computed with a fine resolution) and a history part (computed with a coarse resolution).
In the second part of the paper, we are then interested in a parallel algorithm for spacetime fractional differential equations of the form with λ in the set of continuous and bounded real function C 0 b (R), 0 < α < 1 and β > 0. We recall that, denoting m = β , Caputo's derivative D β x = C D β x in this case is given by Notice that several alternative definitions of fractional derivatives exist [12] such as the Riemann-Liouville (RL) fractional derivative of order β over the interval ] -∞; x], which is defined by while the right RL fractional derivative is given by Similarly, we introduce the left fractional Riemann-Liouville integral operator of order β as follows: Through these notations, we have the relation Let us also recall that the two derivatives are linked by the simple relation for some real a. These definitions can naturally be used to define spatial fractional derivatives. However, to define the spatial fractional derivative, we rather use the Fourier-based definition (Riesz's derivative denoted by R D β x ) which will be more convenient and which could still be implemented on a bounded spatial domain. Denoting by ξ the co-variable to x in Fourier space, and by F the corresponding Fourier transform, we define Based on this definition, we will use a pseudospectral method based on discrete Fourier transform for spatial approximation. We again refer to [6,7,13,30,36] for some discussions about fractional operators and fractional differential equations. Finally, we discuss the combination of the parallel in time and space algorithms along with some numerical experiments. This paper is organized as follows. Section 2 is dedicated to the derivation of the parareal method based on Gorenflo's scheme for solving FODE. Using in particular the parareal method and standard parallel FFTs, we then derive in Sect. 3 a parallel algorithm for spacetime fractional differential equations. We finally conclude in Sect. 4.

Parareal-Gorenflo algorithm for fractional differential equations
We denote by T a coarse grid time-step and by t a fine grid time-step such that T = R t. The coefficient R ∈ N\{0} corresponds to the number of subiterations in-time for each large time-step T. We denote T n = n T and t n;i := n T + iδt for n = 0, . . . , N -1 and for i = 0, . . . , R such that t 0,0 = 0 and t N-1;R = T. For a total of N coarse grid time iterations, we proceed to -sequential computations on [0; T N ]; -parallel computations on each [T n ; T n+1 ] requiring computations from [0; T n ] and on the grids [T n ; T n+1 ] = R i=1 [t n;i-1 ; t n;i ], with n = 0, . . . , N -1. We now derive the algorithm and provide some analytical details.

Gorenflo's scheme
We approximate (2) with Gorenflo's scheme [17], which reads (on the coarse grid, using the notation {y n T } n approximating {y(T n )} n ) as follows: where y n T approximates y(T n ), and the weights read Gorenflo's scheme, which extends Grünwald-Letnikov's idea of a finite difference approximation of the fractional integral. In theory any other FODE solver could be combined with the parareal method such as those presented in [23] or the one used in [38]. The coefficients w (α) i for α = 0.25, 0.5, 0.75 are reported for 1 ≤ i ≤ 20 in Fig. 1. The choice of this method is motivated by its simplicity and the fact that it is possible to easily increase its order of convergence. Other methods, such as the ones used in [23], can naturally be explored.
Numerical experiment 1. We present a simple experiment from [17]: for 0 < α < 1, , for which an explicit solution y exact (t) = t 2 exists. We report the convergence graph in Fig. 1 (Right), that is, the 2 -norm error as a function of the time-steps; we numerically estimate the order of convergence of the method which on this specific example (α = 0.5) leads to p ≈ 1.5. We refer to [25] on the derivation and convergence analysis of this scheme. In a compact form (7) reads where we use the convenient notation from [38]. The algebraic operator C T (from R n+1 to R n+2 ) basically constructs the solution at time t n+1 from the solutions at previous times from (7).
We first discuss the absolute stability of Gorenflo's scheme on a (coarse) grid: which can also be rewritten (8). We are interested in the conditional stability of Gorenflo's scheme. We then have the following.
Proof We aim to study the limit at infinity of the numerical solution to when t goes to infinity. As Gorenflo's method requires a priori y(0) = 0, we reformulate (9) as The solution to (9) is hence deduced from (10) by adding 1: y ← y + 1. Then Gorenflo's scheme with n ≥ 0 First we notice that w (α) 1 = -α ∈ (-1, 0) and more generally α < -w (α) i . As a consequence, we can rewrite the scheme as We prove by induction that {y n T } n is decreasing and bounded from below by -1: 1. We assume that y n T ≤ y n-1 T ≤ · · · ≤ y 0 = 0. We have Thus Hence, as (i) the coefficients w (α) i are negative, (ii) less than 1 in absolute value, and (iii) y 0 = 0, we get and conclude that at least for T ≤ (α/λ) 1/α we get y n+1 T ≤ y n T . Hence, the sequence {y n T } n is increasing. Notice that applying a standard discrete Gronwall inequality would lead to a similar conclusion and can be used for proving the convergence of the method.
2. We still assume by induction that y n T ≥ -1 for all n ≥ 0. As and Hence, for αλ T α ≥ 0, Thus, yet under the assumption αλ T α ≥ 0 and using y n T ≥ -1, we have Thus {y n T } n is decreasing and bounded then convergent to say y * . Taking the limit in n, we get Taking for instance T = (α/λ) 1/α in Gorenflo's scheme gives and using (14), we get We then deduce that the solution is convergent to y * = -1.

Parareal algorithm for fractional differential equations
We now explicitly derive a parallel algorithm for FODE. Each interval [T n ; Gorenflo's scheme on the fine grid reads It would be naturally highly inefficient to solve (even in parallel) the FODE on a "full" fine grid. In order to derive the parareal algorithm, we need to justify that the solution to (2) can be approximated thanks to a sum of "local" FODEs. If we decompose [0; T] = N-1 n=0 [T n ; T n+1 ], and for 1 ≤ n ≤ N -1, we denote by y n the solution to We naturally have the following.
This is a simple consequence on the fact that fractional derivatives and integrals are nonlocal operators. The integral decomposition must be carefully designed.
Proof Recall that y 1 and y n are the respective solutions to y 1 (0) = y 0 , and for n > 1, y n (T n ) = y n-1 (T n ).
Trivially we have y |[0;T 1 ] = y 1 . Next, on any interval [T n ; T n+1 ], we have y n (T n ) = y n-1 (T n ). If y |[T n ;T n+1 ] = y n , then for t ∈ [T n ; T n+1 ] which is in general not true.
The proposition shows that in order to derive a consistent parareal method, it is necessary on each [T n ; T n+1 ] to include a nonlocal contribution from [0; T n ], corresponding to Let us now detail the corresponding parareal method. Recall that different schemes can be used on the coarse and fine levels. Generally speaking, computations on fine grids will be only performed in parallel, while the coarse grid ones will be performed either in parallel or sequentially. We now detail the algorithms.
Algorithm The overall parareal algorithm reads using usual compact notations [29,38] where k denotes the domain decomposition in-time index/iteration, and where Y n;k T ∈ R n+1 represents [y 0;k T , . . . , y n;k T ], that is, the approximate solution on the coarse grid (but eventually computed by combining fine and coarse grid computations) at iteration k using the scheme (7). However, the computation of this quantity differs from the parareal method for ODEs due to the nonlocality of fractional operators. We proceed as follows: -for k ≥ 1, we compute in parallel F T Y n;k T = F T y 0;k T , . . . , F T y n;k T , and where F T (y n;k T ) denotes the approximate solution to which is equivalent, thanks to (3), to with initial data given at T n , and computed partially on a fine grid {t n;0 , . . . , t n;R } ∈ [T n ; T n+1 ] R+1 . Unlike the differential case, the nonlocality in time requires a special care. Indeed, on any interval [T n ; T n+1 ], we still need to include a contribution of former times (t < T n ). In other words, this leads to a potentially very computationally complex method, even if it is solved in parallel. More specifically, for t ∈ [T n ; T n+1 ], we rewrite In particular, for any T n ≤ t ≤ T n+1 , we also have We propose to approximate the first integral (history part) using a coarse grid approximation at iteration k, and the second one on a fine grid. More specifically, we approximate on the coarse grid Notice that we do not have to deal with singularities on this integral, any (higher order) quadrature rule can be utilized. We approximate this term at t n;i = T n + i t, say by (using a rectangle rule for simplicity) while the local part is approximated using the fine grid on [T n ; T n+1 ]. The local part can simply be rewritten as Over [T n ; T n+1 ], that is, Rn ≤ i + Rn ≤ (R + 1)n with 0 ≤ n ≤ N -1 and 0 ≤ i ≤ R, rather than computing on the fine grid a complete Gorenflo scheme which would be, as already mentioned above, highly inefficient from the computational point of view. We compute in parallel (indexed here by n) where H n;i is defined in (16), that is, the history contribution (from [0; T n ]) is computed on the coarse grid and is consistent with D α t y, see Fig. 2. In other words, the contribution, the fine-grid nR+i+1 j=1 w (α) j y nR+i+1-j;k t on [0; T n ], is replaced by a coarse grid approximation H n;i + i+1 j=1 w (α) j y nR+i+1-j;k t . These algebraic equations are explicit and are solved with linear complexity. It is important to notice that at this stage, and say at time t n , each processor should have access to the solution on the coarse grid at any time prior to t n , which is however cheap from the computational and memory usage points of view. The fine grid (local part) contribution could also be approximated using a rectangle rule; recall that the parareal method does not require that the same method be used on the fine-grid and coarse-grid.
-for k ≥ 1, the coarse grid contribution (prediction) reads formally C T Y n;k T = C T y 0;k T , . . . , C T y n;k T , and where C T (y n;k T ) denotes the approximate solution to D α t y(t) = -λy(t) + g(t), t ∈ [T n ; T n+1 ], computed on the coarse grid using Gorenflo's scheme (7).
The convergence criterion is as follows. We repeat the iterations for where δ is a small fixed parameter, and the corresponding k is denoted by k ∞ ∈ N * . The converged parareal-Gorenflo solution at final time is then Y N;k ∞ T . We propose an analysis of the order of the parareal method for the fractional equation approximated by the parareal-Gorenflo scheme. We propose to follow and then extend the analysis presented in [29].

Proposition 2.2
Assume that the algorithm on the fine grid is exact. Then, at iteration k, the parareal scheme has order k, that is, there exists c k (T) such that where y denotes the exact solution.
Notice that the extension to the case where the fine grid resolution is obtained by an approximate numerical method (as described in this paper) is more technical and is not presented here (see discussion below), but as in the classical parareal method for ODEs a similar conclusion is expected. We also refer to [38], where the authors arrive at the same conclusion with a different method. In order to prove the above proposition, we follow the same steps as [29].
Proof The proof relies on the estimate of the jumps defined as follows: i δ n+1-i;k + S n;k , and we define and Y n;k+1 = y n;k + n;k = y n;k + M -1 n S n;k . In particular y n;k+1 T = y n-1;k T (T n ) + δ n;k = G(λ, T)y n-1;k T + δ n;k .
As N nn = 1 and N n-1n = 0, and using exactly similar arguments as in [29] (although it is algebraically a bit more heavy), we get that there exists c > 0 such that That is, as n T ≤ T, there exists c k+1 (T) > 0 such that This concludes the proof.
Denoting by y n;k T (resp. y n;k t ) the solution on the coarse (resp. fine) grid, we get y n;k Ty(T n ) ≤ y n;k Ty n;k t + y n;k ty(T n ) . Then from y 1;k T -G(λ, T)y 0 T we have On the fine grid |y 1;k t -G(λ, T)y 0 T | ≤ c t T. This comes from the fact that the length of the interval is T and t is the fine grid time step. By induction, we have the following.

Corollary 2.1 At iteration k, the parareal algorithm such that Gorenflo's scheme is used on both the coarse and fine grids is an order k method, that is,
A detailed proof can actually be deduced from the theorem of convergence presented in [38].
Numerical experiment 2. We present some simple experiments to illustrate the method developed above. We still consider a benchmark presented in [17]: for 0 < α < 1, , for which an explicit solution y exact (t) = t 2 exists. We denote by Y n;k T the parareal/Gorenflo solution projected on the coarse grid at parareal iteration k. For different values of α, we then report the error max 1≤n≤N Y n exact -Y n;k T 2 in logscale in Fig. 3 (Left) for different values of α, with t = T/R and R = 8, T = 2 -6 . The test shows in particular the strong dependence of the convergence rate as a function of the fractional derivative order. We also report in Fig. 3 (Right) for α = 0.5, T = 2 -3 and respectively with R = 2, 4, 8, 16 subdomains the ∞ -norm error as function of k, that is, In Fig. 4, we finally report the graph of convergence, calculated from max 1≤n≤N Y n;k ∞ Ty(T n ) ∞ as a function of coarse time-steps T = 1/2 i for i = 3, 4, 5, 6 and R = 4 subdomains. All these experiments illustrate the convergence and efficiency of the parareal method combined with Gorenflo's scheme.

Computational complexity
We discuss the computational complexity and data storage of Gorenflo's scheme for N time iterations on a given time grid. At iteration 1 ≤ n ≤ N , the number of operations for updating the solution is linear in n, that is, the overall computational complexity is O(N 2 ). Moreover, at iteration N , O(N) data must be stored. Regarding the computational complexity of the parareal approach, we denote by N the number of iterations on the coarse grid and NR on the fine grid. At each parareal iteration k, (i) O(N 2 ) operations are performed sequentially on the coarse grid, (ii) O(NR 2 ) operations on each fine grid covering [T n ; T n+1 ]. The total number of operations is hence O(k ∞ N(N + R 2 )). On p processors, the complexity is O(k ∞ N(N + R 2 )/p) per processor. Notice that a sequential computation on a fine grid requires O(N 2 R 2 ) operations.

Space-time parallel algorithm for linear fractional in space-time differential equations
The approach which was presented above can be extended to fractional in space and time equation: with 0 < α < 1, β > 0, and λ ∈ C 0 b (R) on [0; T]×R and where D α t denotes Caputo's derivative and D β x denotes Riesz' derivative. In this case it is possible to implement a coupled pseudospectral parareal method. Formally, denoting by u(t, ·) = F x u(t, ·) the Fourier transform in space (where ξ denotes the co-variable associated with x), we first assume that λ is a real constant. We directly have We can then directly apply the parareal-Gorenflo's method derived above. We set y ξ (t) := (iξ ) β u(t, ξ ), hence D α t y ξ (t) = -λy ξ (t).
In this case the Fourier transform can easily be implemented in parallel, while the timefractional derivative can be parallelized using the parareal method. However, whenever λ is no more constant, even the sequential algorithm is not that simple anymore.

Sequential algorithm
When λ is space-dependent, it is no more possible to efficiently directly apply a Fourier transform in space. In the latter case, we detail the proposed algorithm and we will then focus on the parallelization aspects. We denote the spatial grid points and indices as follows:  17) is hence solved on a bounded spatial domain [-L x , L x ] with periodic boundary conditions. Regarding the pseudospectral approximations, we use the following notation: That is, u p (t) is an approximate discrete Fourier transform with and u k (t) is an approximation to u(t, x k ) obtained through an approximate inverse discrete Fourier transform with, for some c > 0, for s > 1/2 (in 1d) and u(t, ·) ∈ L 1 ∩ H s periodic. In general, we do not have u(t, x k ) = u k (t). Such pseudospectral projection is for instance studied in [8]. Typically, the high modes which are neglected in the above approximation lead to the following aliasing error estimates for u(t, ·) ∈ H r : there exists c > 0 such that for some r > s > 1/2 (in 1d) and u(t, ·) ∈ L 1 ∩ H r periodic. We yet refer to [8] for details. We then introduce a discrete operator This approximation was analyzed in [6] for approximating space fractional partial differential equations or the Dirac equation in [5]. Interestingly, when solving the equation as an initial boundary value problem (on a bounded domain [-L x , L x ]), the above spectral approach is still applicable. Indeed, imposing periodic boundary conditions u(t, L x ) = u(t, -L x ), it is in principle possible to include absorbing layers (17) in order to avoid (i) potential artificial wave reflections and to absorb (ii) periodic waves traveling from one boundary to the next. Basically, denoting by S an absorbing function where · 2 is the 2 ( xZ). Numerical experiment 3. In order to illustrate the spectral convergence in space, we propose a simple test over with u 0 (x) = exp(-x 2 + ik 0 x) cos(πx/2)/N , where k 0 = -1, T = 0.1, L x = 25, λ(x) = exp(-x 2 /100), and N is a normalization constant such that u 0 L 2 = 1. As far as we know, there is no explicit solution to this equation. We choose T = 10 -2 and make x vary. We report in logscale the 2 -norm error with a solution of reference at time t = T (computed on a 5 × 10 5 -point grid) as function of the space step from to 64 × 10 -4 to 4 × 10 -4 in Fig. 5 (Left), and the solution at final time T (Right). The experiment again shows the nice convergence properties of the proposed parallel methodology.

Space and time parallelization
In the following, we assume that p nodes/processors are used to solve in parallel a fractional space-time differential equations. Parallelization in space. The parallelization in space is relatively straightforward and is two-fold. We assume in the following that we have access to p processors.
1. It is first based on the parallelization of the discrete Fourier transform (or fast Fourier transform), namely of [[D β x ]]u n k , which approximates F -1 ((iξ ) β F)u(t, x k ). The parallelization in space is hence first based on the parallel computation of finite sums involved in the discrete Fourier transforms. This is typically implemented thanks to FFTW [18] on O N x and P N x .

Once [[D β
x ]]u n is computed, in order to update (in time) the approximate solution, we decompose x ;1≤ ≤p , For a very large number of processors p ∈ N\{0}, the first step of this parallelization in space, more specifically the FFT parallelization, can be become inefficient (corresponding to the discrete Fourier transform parallelization), in particular of course if p = O(N x ). The latter condition can indeed occur with high performance computers which can possess hundreds of thousands of processors. A coupling with parallelization in time becomes relevant and is thus discussed in the following paragraph. Parallelization in space and time. For a given number of processors p, we denote by p x and p t two integers such that p = p x + p t . The key element in the simultaneous parallelization in space and time is the commutation of D α t and λ(x)D β x , that is, [D α t , λ(x)D β x ] = 0, where [·, ·] is the operator commutator. Whenever p is very large, we propose a parallelization in space and time using p x (resp. p t ) processors for the parallelization in space (resp. time). More specifically, we decompose O N x in p x disjoint subdomains O N x = =1,...,p x O ( ) where we have denoted by u n T; = {u n;k T; } k the coarse grid in-time approximation of u (i) at time T n , (ii) iteration k, (iii) and r ∈ O ( ) N x . The parallelization in time requires communications, see Fig. 6.

Conclusion
We have proposed a simple extension of the parareal method combined with a fractional equation solver, namely Gorenflo's scheme. The same strategy, based on the parareal method, can naturally be adapted to other FODE solvers and still benefit from the outstanding convergence properties of the parareal method. Some mathematical properties, such as stability, accuracy, etc., were also proposed along with numerical experiments (Figs. 1, 4). The latter allowed to illustrate the analytical properties proven in the paper and to show the feasibility of the approach from a practical point of view. Extension to parallel algorithms to space-time fractional PDE solvers was also proposed. The spatial parallelization was relying on (i) the Riesz derivative and (ii) the parallelization of the fast Fourier transform, and was successfully combined with the parareal-based FODE solver, see Fig. 5.
The extension of the proposed strategy to nonlinear scalar equations is currently investigated as well as its application to fractional physical models.