Existence of a unique weak solution to a non-autonomous time-fractional diffusion equation with space-dependent variable order

In this contribution, we investigate an initial-boundary value problem for a fractional diffusion equation with Caputo fractional derivative of space-dependent variable order where the coefficients are dependent on spatial and time variables. We consider a bounded Lipschitz domain and a homogeneous Dirichlet boundary condition. Variable-order fractional differential operators originate in anomalous diffusion modelling. Using the strongly positive definiteness of the governing kernel, we establish the existence of a unique weak solution in u ∈ L∞((0, T ), H0( )) to the problem if the initial data belongs to H0( ). We show that the solution belongs to C([0, T ], H0( ) ∗ ) in the case of a Caputo fractional derivative of constant order. We generalise a fundamental identity for integro-differential operators of the form d dt (k ∗ v)(t) to a convolution kernel that is also space-dependent and employ this result when searching for more regular solutions. We also discuss the situation that the domain consists of separated subdomains.


Mathematical setting and motivation
The goal of this contribution is to show the existence and uniqueness of u for given f and u 0 such that The symbol ' * ' stands for the convolution product defined by where the kernel g 1-β(x) is defined by where denotes the gamma function, and β ∈ C( ) satisfies The term D β(x) t u := ∂ t (g * (u -ũ 0 )) in (2) represents the Caputo fractional derivative of space-dependent variable order β(x), which arises in the modelling of anomalous diffusion. Anomalous diffusion is a rapidly growing field of research with applications in physics, chemistry, biology and several other branches of engineering [1]. A typical example is heat conduction and fluid flow in porous media. We refer the reader to [2] for a comprehensive overview on fractional calculus and anomalous diffusion. To model diffusion processes in a homogeneous medium, the constant-order (β is constant over ) fractional diffusion model is sufficient. But in complex media, the presence of heterogeneous regions causes variations of permeability in different spatial positions. In this situation, the space-dependent variable-order model is more suitable to describe location-dependent diffusion processes, see e.g. [3][4][5][6][7][8][9][10][11]. We mention also the existence of time-dependent variable-order models when diffusion behaviour changes with the time evolution [12][13][14][15][16] and some other recent interesting applications of fractional calculus [17][18][19][20].
In the recent works [37,38], using a classical variational approach, the author established the existence of a unique weak solution to a non-autonomous time-fractional diffusion (respectively, wave) equation of constant and distributed order. The solution is continuous on [0, T] without restriction on the order of the fractional derivative and thus under low regularity assumptions. It is important to note here that weakly singular solutions are included in the class of admissible solutions.
The existence of a solution to non-autonomous variable-order fractional differential equation is proved in [39]. However, to the best of our knowledge there is only one result available in the literature related to the well-posedness of the time-fractional diffusion equation with space-dependent variable order. In [40], the authors investigated the existence and uniqueness result for problem (2) in the case that the elliptic operator L is autonomous. The authors do not use the usual definition of weak solution but they characterise the weak solution to (2) as the original of the solution to the Laplace transform (of tempered distributions) of (2) with respect to the time variable.

New aspects and outline
We study a more general multi-dimensional case for a governing PDE with timedependent coefficients (non-autonomous system). This implies that the Mittag-Leffler analysis or the approach by means of the Laplace transform is not appropriate and one has to use other tools to succeed. The approach in this paper follows the standard procedure for classical parabolic problems: first we discretize the problem in time using a convolution quadrature, next we obtain a priori estimates, and on the basis of these estimates we show the existence of a solution. In problem (2), we consider a homogeneous Dirichlet boundary condition. Throughout the paper we discuss also the situation wherein a homogeneous Neumann boundary condition is considered on the boundary. We assume that β is continuous on as for instance in [4,7]. However, in Sect. 8, we discuss the situation where consists of separated regions and thus β can be discontinuous, see e.g. [3,5,7].
The Caputo fractional time derivative of order β ∈ (0, 1) is a special case of the variableorder fractional derivative (i.e. β(x) = β in ). The authors in [41] noted that the smoothness ofũ 0 and f in (2) does not always imply the smoothness of the exact solution. The solution of such a problem is shown in general to have a weak singularity near the initial time t = 0. It is important to take this behaviour of the solution into account when studying the well-posedness of problem (2). This implies that compatibility condition as L(·, 0)ũ 0 (x) = f (·, 0) should be avoided as ∂ t (g * (u -ũ 0 ))(0) is not necessarily equal to zero in general [42]. Such a compatibility condition would restrict the class of admissible solutions.
One of the main points in the analysis is the property that the kernel g in the analysis is strongly positive definite. This and other properties of g are discussed in Sect. 2. The strongly positive definiteness of g is used in Theorem 6.2 to show the uniqueness of a solution to problem (2).
In Sect. 3, we generalise a fundamental identity for integro-differential operators of the form d dt (k * v)(t) by Zacher [25] to a convolution kernel k that depends on both the timeand space-variable, see Sect. 3 and in particular Lemma 3.1. Using this generalisation, we can show Corollary 3.1, which is in particular helpful for showing the uniqueness of a solution if we know that ∂ t (g * u)(t) ∈ L 2 ( ) for a.a. t ∈ (0, T). Moreover, we explain how to discretize the convolution in time (see Eq. (17)), and we derive a discrete estimate (see Lemma 3.3), which will be useful when establishing the a priori estimates later in the article.
In Sect. 4, the assumptions on the data are discussed and the weak formulation of (2) is derived. Afterwards, in Sect. 5, we propose a time-discrete scheme to approximate the solution at a single timestep. Moreover, a priori estimates are derived in the remainder of this section.
Then, the existence of a solution to the variational problem is discussed in Sect. 6. Under low regularity assumptions on the data (we suppose thatũ 0 ∈ H 1 0 ( ) and consider natural assumptions on f ), we show in Theorem 6.1 the existence of a solution to problem (2) in L 2 ((0, T), H 1 0 ( )). Then, using the strongly positive definiteness of the kernel, we establish the uniqueness of a solution in Theorem 6.2. However, we are not able to show that u(0) =ũ 0 for the Caputo fractional derivative with variable order. Next, in Sect. 7, we show how to overcome this problem if β(x) = β with β ∈ (0, 1) by showing that u ∈ C([0, T], H 1 0 ( ) * ) in this situation independent of the order of the fractional derivative. We slightly improve the results stated in [37], see Remark 1.1 for more details. Moreover, we study under which assumptions we can obtain a solution to problem (2) in L 2 ((0, T), H 2 ( ) ∩ H 1 0 ( )) if β(x) = β and A = a. The advantage here is that we can use Corollary 3.1 when establishing the uniqueness of a solution. Next, we show that if u 0 ∈ H 2 ( ) ∩ H 1 0 ( ) the solution can belong to C([0, T], L 2 ( )) under appropriate assumptions. We conclude this section with a discussion of the performance of the proposed scheme in the constant-order case. To improve the computational results, we consider a convolution quadrature on a graded mesh and compare the results with the well-known L1-algorithm.
Finally, in Sect. 8, we consider a domain consisting of multiple disjunct regions. The existence results obtained before are still valid in the case thatũ 0 ∈ H 1 0 ( ) but some interface conditions are needed. We also discuss in this section the L 2 ((0, T), H 2 ( ) ∩ H 1 0 ( )) regularity of the solution in this setting. Remark 1.1 (Comparison with [37]) In this paper, the positive definiteness of the kernel and the theory on Volterra equations is used to establish the uniqueness of a solution to problem (2). This approach is more careful than using [26,Corollary 16] as is done in [37, Theorem 2.1] (for constant and distributed order of the fractional derivative) since it is not clear that ∂ t (g * u)(t) ∈ L 2 ( ) for t ∈ (0, T) and that u is absolutely continuous under the assumptions of [37,Theorem 2.1]. This is the reason why we consider ∂ t (g * (u -ũ 0 )) in the formulation of problem (2) instead of g * ∂ t u in [37,Eq. (2)]. Moreover, although the constant-order fractional derivative is a special case, the space-dependent variable order considered in this contribution leads to a more complicated analysis in comparison with [37].
We conclude this introduction by stating the function spaces used in the paper. Remark 1.2 (Additional notations) Consider an abstract Banach space X with norm · X . Let p ≥ 1.

Properties of the singular kernel g
In this section, we give the properties of g that will be used throughout the paper. First, note that since (z) ≥ (1) = 1 for all z ∈ (0, 1). Thus We also obtain that g(x, ·) ∈ L 1 (0, T) for all x ∈ as follows: since (z) ≥ 1 2 for z ∈ (1, 2). It is also clear that g(x, ·) ∈ L 1 loc (0, ∞) for all x ∈ . Moreover, consider [t 1 , t 2 ] ⊂ (0, T), then i.e. ∂ t g(x, ·) ∈ L 1 loc (0, T). Moreover, as g is decreasing in time, we have the existence ofg such that We note also that the following shift formula is valid for s > 0: using that the function f (x) = x α with α ∈ (0, 1] is α-Hölder continuous. Another important property of g is contained in the following lemma.

Lemma 2.1 (Strongly positive definiteness)
For all v ∈ L 2 loc ((0, ∞), L 2 ( )), the kernel g satisfies Proof First, we note that we can interchange the order of integration as where we used Young's inequality for convolutions at position ( ): The function g is a strongly positive definite kernel for every x ∈ since it satisfies see [43,Corollary 2.2] or [44], i.e. for any x ∈ , there exists a constant γ > 0 (varying with x) such that g(x, t)γ exp(-t) is of positive type and thus 2 ) since for any y ∈ (-∞, ∞) and a.a. x ∈ we have that lim inf where we used that We conclude the proof by noting that for a.a. x ∈ it holds that

Technical lemmas
First, we generalise a fundamental identity for integro-differential operators of the form Then, for any v ∈ L 2 ((0, T), L 2 ( )) and any k : it holds for all t ∈ (0, T) that Proof First note that from it follows for a.a. x ∈ and all t ∈ [0, T] that From Leibniz's rule for differentiation under the integral sign, we get that which is valid and well-defined for all t ∈ (0, T). The second term on the right-hand side (RHS) can be handled as follows: , v(t) ds 1 Only the right-hand and the left-hand derivatives need to exist at the boundary points t = 0 and t = T , respectively.
where we switched the order of integration twice, which is allowed since ∂ t k is uniformly bounded. The last integral on the RHS of the previous equation can be rewritten as follows: where we again exchanged the order of integration two times and used Leibniz's rule two times. Combining the previous results concludes the proof.
The function g does not satisfy the conditions in the previous lemma, and therefore the lemma cannot be applied directly. It is the following lemma that will lead to the useful Proof From ∂ t k(x, ·) ∈ L 1 loc (0, T) it follows that k(x, ·) is continuous on (0, T). The only singularity of k(x, ·) is in t = 0 since it is a decreasing function in the time variable, i.e.
Hence, we can define the sequence (k n ) n∈N by We obtain from the properties of k that Hence, the function k n satisfies the conditions of Lemma 3.1. Moreover, we also have the uniform boundedness in L 2 ( ) and the continuity of v in the time variable, i.
Hence, we have that Therefore, integrating the result of Lemma 3.1 over (0, η) ⊂ (0, T) gives that Next, we want to pass to the limit n → ∞ in (12). First, note that The sequence f n : The functionf is absolutely integrable on (0, η) × since where we have used Young's inequality for convolutions (in the time variable) at position ( ). Therefore, we can apply the Lebesgue dominated theorem to obtain for the first term on RHS of (13) that We perform integration by parts in the time variable on the second term in the RHS of (13) and then pass to the limit n → ∞ in this term, i.e. we consider The dominating functions are absolutely integrable since Hence, again by the Lebesgue dominated theorem, we can take the limit n → ∞ in the RHS of (14). Therefore, we obtain for the second term on the RHS of (13) that Finally, we need to pass to the limit n → ∞ in the RHS of (12). We apply integration by parts on both terms to obtain that We only point out the limit transition for the first term by checking that the dominating functions are absolutely integrable as the second term can be handled similarly. Indeed, we get that Using the results above, we can pass to the limit n → ∞ in (12), which concludes the proof.
The function g defined in (3) satisfies the properties of the kernel in the previous lemma, see Eq. ((6)-(10)). However, the solution u does not satisfy ∂ t u ∈ L 2 (0, T). Fortunately, we can use [23, Definition 3.1] to obtain the following result, which will be helpful later in showing the uniqueness of a solution to problem (2).

Corollary 3.1 For any
Next, we can pass to the limit n → ∞ and obtain the result.
In order to be able to show the well-posedness of problem (2), we prove a discrete version of Lemma 3.2, which is crucial to establishing a priori estimates in Sect. 5. First, we discretize the time interval [0, T] into n ∈ N equidistant subintervals [t i-1 , t i ] with length τ = T n . The approximation of a function z at time t = t i , 0 ≤ i ≤ n, is denoted by z i . The same notation is also used for a given function. Moreover, we approximate and the time-discrete convolution is defined as follows (a.a. x ∈ ): with In (17), a possible singularity of k at t = 0 is avoided. The proof of the following lemma follows the same lines as the proof of [46, Lemma 3.2] for a solely time-dependent kernel.

Lemma 3.3 Let ⊂ R d be a bounded Lipschitz domain. Let
(v i ) i∈N and (k i ) i∈N be sequences of real-valued functions defined on . Suppose that (v i ) i∈N in L 2 ( ) and that the sequence (k i ) i∈N is positive, uniformly bounded and decreasing, i.e. for a.a. x ∈ it holds that where (k * v) i is the time-discrete convolution defined in (17).
Proof The result follows from multiplying the following inequality with τ , integrating it over and summing up the result over i = 1, . . . , j (note that (k * v 2 ) 0 (x) dx = 0): We prove that this inequality is satisfied. First, notice that for a.a. x ∈ and i ≥ 1, it holds that From the properties of the sequence (k i ) i∈N , it follows for a.a. x ∈ that

Assumptions, weak formulation and uniqueness of a solution
In this section, we first summarise all assumptions that are necessary to obtain the wellposedness result in Theorem 6.1. We assume that • AS-2: The matrix A is uniformly elliptic, i.e. there exists a constant α > 0 such that • AS-6:ũ 0 ∈ H 1 0 ( ). Further, we associate a bilinear form L with the differential operator L defined in (1) as follows: with u(t), ϕ ∈ H 1 0 ( ). Using the properties above, we obtain that and i.e. the bilinear form L is continuous, and H 1 0 ( )-elliptic due to Friedrichs's inequality. Now, we define the variational formulation of problem (2).
Next, we show that conditions (15) and (16) with k = g are satisfied for v = u when the solution u satisfies the regularity assumptions in Definition 4.1. First, we have by Hölder's inequality that Secondly, we see that Thus, conditions (22) and (23) are satisfied for v = u.

Time discretization
In this section, a time-discrete numerical scheme to solve problem (21) is denoted by u i . Moreover, the time derivative at time t = t i is approximated by the backward Euler finite-difference formula These notations are also used for any function z = u. We propose the following timediscrete variational problem: Using the time-discrete convolution (17), the discrete problem can be equivalently written as with and The summation occurring in F i is not contributing for i = 1. The well-posedness of this problem under appropriate assumptions on the data is stated in the following lemma.
Proof From the properties of L and g, it follows that the bilinear form a i is H 1 0 ( )-elliptic and continuous. Ifũ 0 , . . . , u i-1 ∈ L 2 ( ), then the linear functional F i is bounded since The existence and uniqueness of u i ∈ H 1 0 ( ) to problem (25) follows from the Lax-Milgram lemma.
We prove some a priori estimates in the following two lemmas. These are required to ensure the existence of a solution to (21) and to prove the convergence of approximations towards this solution.

Lemma 5.2
Let the assumptions of Lemma 5.1 be fulfilled. Then a positive constant C exists such that, for every j = 1, 2, . . . , n, the following relation holds: Proof We set ϕ = u i τ in (24) and sum it up for i = 1, . . . , j with 1 ≤ j ≤ n. Using relation (19), we obtain that Using the ε-Young inequality, we estimate the second term in the RHS as follows: Using Friedrichs's inequality, we obtain that Employing Lemma 3.3 and Eq. (20) implies Fixing ε 1 > 0 and ε 2 > 0 sufficiently small gives the result.

Lemma 5.3
Let assumptions AS-(1-6) be fulfilled. Then positive constants C and τ 0 exist such that, for any τ < τ 0 and for every j = 1, 2, . . . , n, the following relation holds: Proof We set ϕ = δu i τ (hereũ 0 ∈ H 1 0 ( ) is needed) in (24) and sum the result up for i = 1, . . . , j with 1 ≤ j ≤ n. We obtain that The first term on the left-hand side is positive since we can apply [47,Eq. 3.2]. Next, we recall the following per partes formula for a symmetric bilinear form [48,Eq. 3.16]: Hence, due to the symmetry of A, we have that and thus by Lemma 5.2 we get that Similarly, using the conditions on c and Lemma 5.2, we get that The term on the RHS of (26) can be estimated by using the partial summation rule If f ∈ H 1 ((0, T), L 2 ( )) ⊂ C([0, T], L 2 ( )), then by the ε-Young inequality, Friedrichs's inequality and Lemma 5.2, we get that The result of the lemma follows by fixing ε sufficiently small.
So the first limit transition is satisfied. For the second limit transition, we first note that (g * u) n l (t) = (g n l * u n l )( t τ n l ) for any t ∈ (0, T). Moreover, we have that We show that T 0 t τ t g n l t τs u n l (s) ds, ϕ dt → 0 as l → ∞ and T 0 t 0 g n l t τsg(ts) u n l (s) ds, ϕ dt → 0 as l → ∞ such that limit transition (ii) is satisfied. First, we have for t ∈ (t i-1 , t i ] that For t ∈ (t i-1 , t i ] and a.a. x ∈ , it holds that Then, using Lemma 5.2, we obtain from Eq. (34) for t ∈ (t i-1 , t i ] that which is valid for i = 1, . . . , n. Secondly, we deduce for the second term on the RHS of (33) that 1 2 dx since (remember that the function g is decreasing in time and thus g n l ≤ g) we have for t ∈ (t i- where we used the α-Hölder continuity of f (x) = x α with α ∈ (0, 1]. Next, we see that where we used Young's inequality for convolutions at position ( ). Therefore, using Lemma 5.2, we finally obtain that We conclude that the second limit transmission is valid. Limit transmission (iii) follows from Eq. (30) since using Young's inequality for convolutions at position ( ). Now, we are able to pass to the limit l → ∞ in (32), and we obtain that Next, we differentiate the previous relation with respect to ξ , i.e.

Theorem 6.2 (Uniqueness) Let assumptions AS-(1-6) be fulfilled. Then there exists a unique weak solution u to problem
Proof We show the uniqueness of a solution by contradiction. We suppose that two solutions u 1 and u 2 to (21) exist and set u = u 1u 2 . Then u satisfies (21) with f = 0 in Q T and u(x, 0) = 0 in . Then, we integrate the result over t ∈ (0, η) ⊂ (0, T) and put ϕ = u(η). Afterwards, we integrate in time over η ∈ (0, ξ ) ⊂ (0, T) and obtain that Using the symmetry of A and integration by parts, we see that Hence, using the ε-Young inequality, we see that Similarly, we have that Therefore, fixing ε sufficiently small and applying the Grönwall lemma imply that Hence, we obtain that e * u = 0 a.e. in Q T . From the theory on Volterra equations, it follows that u = 0 a.e. in Q T , cf. [51, Theorem 3.5]. in Theorem 6.2. This step is violated when a Neumann condition is considered on the whole boundary of the domain (A(t)∇u(t) · ν = 0 on ∂ for t > 0). However, we can assume that c ≥ c 0 > 0 in Q T in order to be able to obtain the uniqueness of a solution. In fact, this assumption is also necessary when establishing the a priori estimate in Lemma 5.3 if a Neumann boundary condition is considered.
Remark 6.2 Note that the convergence of Rothe's functions towards the weak solution in Theorem 6.1 is also valid for the entire Rothe's sequence as the solution is unique.
We are not able to show that u(0) =ũ 0 for the space-dependent variable-order fractional derivative under consideration. In the next section, we explain how to obtain this convergence result in the case of a constant-order fractional derivative.
Remark 7.2 (Neumann boundary condition) Theorem 7.1 is also satisfied when considering a homogeneous Neumann boundary condition if c ≥ c 0 > 0.
Remark 7. 3 We use the absolute continuity in time with values in H 1 0 ( ) * of (g * (u -ũ 0 ))(t) to obtain the continuity in time of u(t) with values in H 1 0 ( ) * . We note that no information about ∂ t u itself is obtained.

More regular solution
In this section, we show that the solution u to problem (2) belongs to if the tensor A in the principal part of the differential operator L is a 1 × 1 matrix, i.e. we suppose that

L(x, t)u(x, t) = -∇ · a(x, t)∇u(x, t) + c(x, t)u(x, t).
This leads to the following additional assumptions: • AS-7: β ∈ C( ) with 0 < β(x) ≤ β 1 < 1 for all x ∈ ; • AS-8: A ∈ (L ∞ (Q T )) 1×1 , i.e. A = a, and ∇a ∈ (L ∞ (Q T )) d . We note that the ellipticity assumption AS-2 reduces to The goal is to derive some higher stability results for the discrete solution. We have already from the Lax-Milgram lemma that there exists a unique u i ∈ H 1 0 ( ) solving problem (24) for any i = 1, . . . , n, see Lemma 5.1. Moreover, from (24), it follows that However, as the RHS is an element of L 2 ( ) (thanks to assumption AS-8), we also have that -a i u i is an element of L 2 ( ) for any i = 1, . . . , n. Hence, from the ellipticity assumption, we get that u i ∈ L 2 ( ) for any i = 1, . . . , n. Now, we are able to establish the following a priori estimate.

Lemma 7.1 Let assumptions AS-(1-8) be fulfilled. Then a positive constant C exists such
that, for every j = 1, 2, . . . , n, the following relation holds: (38) byu i τ , integrate the result over and sum up the result for i = 1, . . . j keeping 1 ≤ j ≤ n. We obtain that We apply Green's theorem on the first term on the LHS. Since g is space-independent here and u i | = 0 for i = 0, . . . , n, we get that Hence, using relation (19), we obtain that From Lemma 3.3, again using that g is solely time-dependent, we obtain that Using the ε-Young inequality, we estimate the second term in the RHS as follows: Using Lemma 5.2, we easily see that Therefore, we obtain the following estimate: We conclude the proof by fixing ε 1 and ε 2 sufficiently small.
This latter result implies that the conditions of Corollary 3.1 are satisfied. It is here that we can use this result to obtain in a different way (before we used the positive definiteness of the kernel) the uniqueness of solution to problem (21). We consider u = u 1u 2 with u 1 and u 2 different solutions to (21). Thus u satisfies (21) with f = 0 in Q T and u(·, 0) = 0 in . Therefore, taking ϕ = u(t) in (21) and integrating with respect to time over (0, η) ⊂ (0, T) gives From Corollary 3.1, it follows for any η ∈ (0, T) that (note that β 1 = β in (8)) i.e. u = 0 a.e. in Q T . Remark 7.4 (Neumann boundary condition) We discussed before in Remark 6.1 that the coefficient c needs to satisfy additionally c ≥ c 0 > 0 (next to assumptions AS-(1-6)) in order to obtain the uniqueness of a solution and Lemma 5.3 when a Neumann condition is considered on the whole boundary of the domain. We note that Eq. (39) is also satisfied without additional assumption on c in the case of a Neumann boundary condition on the complete boundary (as f ≡ 0 in the proof of uniqueness). Moreover, from [54, Theorem 2.50], it follows that the norms u + u and u H 2 ( ) are equivalent for u ∈ H 2 ( ) satisfying ∇u · ν = 0 on ∂ . This implies that Theorem 7.2 is also satisfied when considering a homogeneous Neumann boundary condition if c ≥ c 0 > 0.

Lemma 7.2 Let assumptions AS-
Proof We start with multiplying (38) byδu i τ , integrating the result over and summing it up for i = 1, . . . j keeping 1 ≤ j ≤ n. We get by AS-10 and Green's theorem the following equality: Further, we follow the lines of the proof of Lemma 5.3. The first term on the LHS of (40) is positive, and for the second we use (27) in order to obtain that Using again (27) and the result of Lemma 5.3 gives us If f ∈ H 1 ((0, T), L 2 ( )), then the term on the RHS of (40) can be estimated as follows: The result of the lemma follows by fixing ε sufficiently small.

Numerical experiments
In this subsection, we first test the performance of the time-discrete scheme (25) (temporal error) for solving (2) for a smooth solution and a typical solution given by respectively. We will perform several numerical experiments for different values of the order of the fractional derivative β. We consider = (0, 1) and T = 1. We solve problems (25) at each time step by the finite element method (FEM) using the first-order (P1-FEM) Lagrange polynomials for the space discretization, where we take the number of space discretization intervals equal to 128. We use the finite element library DOLFIN [55,56] from the FEniCS project [57,58] to solve these problems. The CPU time (in seconds, Intel® Core TM i7-1065G7 Processor) increases fast when increasing the number of time discretization intervals, as at each time step one has to use the numerical solutions at all preceding time levels. In the numerical examples, we consider the following errors: The error E n conv is motivated by considering that we have ∂ t (g * (u -ũ 0 ) in the governing PDE instead of ∂ t u in a classical parabolic PDE. In the first experiment, we consider the smooth solution u   (17): log 2 E n conv conclude that the asymptotic rate of convergence for E n max is of order τ β , but a very fine timestep τ will be needed to obtain this convergence rate, which will lead to a huge computational complexity. Moreover, it is clear that we have O(τ 1-β ) convergence for E n conv . The same conclusions can be drawn from Figs. 3 and 4, where we investigate the algorithm in case of u 2,β ex for the different values of β ∈ {0.2, 0.4, 0.6, 0.8}. For these typical solutions, the timestep τ needs to be smaller in comparison with the smooth solutions (especially for small β = 0.2). The numerical results obtained for these examples validate the convergence of the proposed algorithms. Important to note is that E n max = max x∈ |u n (x, t 1 )u ex (x, t 1 )| in all these experiments. An interesting direction for future research is to investigate theoretically the order of convergence of E n max and E n conv . As we mentioned before, in order to obtain good numerical approximations, a very small timestep τ is needed, which is very time consuming. We may conclude that the time-discrete scheme (25) mainly has theoretical advantages. For this reason, we try to improve the time-discrete scheme (25) (leading to the theoretical results obtained in this paper) from computational viewpoint by considering a graded mesh and allocating more time-points around t = 0 [59][60][61]. We consider a graded time-partitioning of the form t j = T(j/n) r for j = 0, 1, . . . , n, where the constant mesh grading is assumed to satisfy r ≥ 1. We put τ j := t jt j-1 for j = 1, . . . , n and consider the following time-discrete convolution on the graded mesh: If r = 1, then the mesh is uniform and we easily see that we get the approximation (17) on a uniform mesh. The use of a graded mesh increases the temporal mesh near t = T (it is coarser), and so it is possible that the error around t = T starts to dominate the error around the initial time t = 0 (possibly also the space discretization error starts to interfere the results). This is the reason why we also calculate the maximum error on the initial time layer I := [0, t 1 ] and investigate its behaviour, i.e.
In case of the graded mesh, the convergence rates are derived as follows: In the experiments, we take r = 2-β β in accordance with the optimal mesh grading for the classical L1-approximation for the Caputo fractional derivative [41,62]. In Tables 1 and 2, we give the maximum errors and the orders of convergence for u 1,β ex and u 2,β ex , respectively. For r = 2-β β , we see that the asymptotic convergence rate of E n max I is 2β, and we again observe that the asymptotic convergence rate of E n conv is 1β. From these experiments, we are not able to make conclusions for the convergence rate of E n max . However, it is clear that the maximum error is smaller in comparison with the uniform scheme (17). The same order of accuracy can only be obtained for the uniform scheme in the case of a very small timestep τ .  Finally, we compare these results with the results obtained when discretising the Caputo time fractional derivative using the well-known L1-algorithm on uniform [63] and graded meshes [41]. The L1-approximation is defined as follows: where g α (t) := t α-1 (α) . Hence, the L1-approximation to the Caputo fractional derivative of order β ∈ (0, 1) at the time point t i is given by Therefore, using this approximation, the time-discrete scheme (25) becomes Now, we perform the same experiments as before using the L1-approximation. The results for a uniform mesh (i.e. (42) with r = 1) are plotted in Figs. 5-8. The maximum error is smaller in comparison with the uniform time-discrete convolution (17). The results suggest an asymptotic convergence rate of O(τ β ) for E n max and of O(τ ) for E n conv when using  (42). The results for the graded mesh with r = 2-β β are given in Tables 3 and 4. We again see that the asymptotic convergence rate of E n max I is 2β and that the asymptotic convergence rate of E n conv is 1. We observe that the error away from t = 0 can dominate the error around the initial time, which gives a possible explanation for not obtaining the convergence rate of 2β for E n max as was obtained before in [41]. We note that the maximum error E n max I is smaller in comparison with using the time-discrete convolution quadrature (41) with r = 2-β β for every experiment. However, we observe that the maximum error E n max is (slightly) smaller in comparison with using the time-discrete convolution (41) with r = 2-β β for β = 0.6 and β = 0.8, but it is (slightly) larger for β = 0.2 and β = 0.4. Further research should be undertaken to investigate these observations in more detail.

Multiple regions
In this section, we consider a composite medium consisting out of N separated subdomains (e.g. layers). For ease of exposition, we take N = 2 and make the following assumptions: • AS-11: = 1 ∪ 2 ∪ S, where 1 and 2 are non-overlapping Lipschitz subdomains with interface S := 1 ∩ 2 ; • AS-12: The normal flux of u is continuous along S, i.e. A∇u(t) · n S = A 2 ∇u 2 (t) -A 1 ∇u 1 (t) · n = 0, t ∈ (0, T], (43) where n denotes the outer normal on 1 , and u 1 , u 2 are the limiting values of the function u as S is approached from 1 , 2 , respectively. The weak formulation (4.1) is also satisfied in this situation as AS-12 implies that for all ϕ ∈ H 1 0 ( ) it holds that The analysis in Sect. 6 stays valid. Therefore, we can conclude the following theorem.

Lemma 8.1 Let assumptions AS-
We multiply (38) by -a u i τ and integrate the result over 1 and 2 , respectively. In particular for the first term on the LHS, using ∇g = 0 in 1 and 2 , we obtain that Hence, by u i ∈ H 1 0 ( ) and AS-12, we get that -(g * δu) i , a u i = (g * ∇δu) i , a∇u i + (g * δu) i , [[a∇u i · n]] S (43) = (g * ∇δu) i , a∇u i .
We add up the resulting equations on 1 and 2 and sum the result over i = 1, . . . j with 1 ≤ j ≤ n. Using relation (19), we get that We obtain from Lemma 3.3 that All the other terms in (45) can be handled as in Lemma 7.1, and so we can conclude the proof.
We do not know how to get control over the boundary term as l 1 = l 2 . However, ifũ 0 in H 2 ( ), then we can show the continuity in time by multiplying (38) by -a δu i τ . Then, as in the proof of Lemma 8.1, we have that -(g * δu) i , a δu i = (g * ∇δu) i , a∇δu i + (g * δu) i , [[a∇δu i · n]] S .
Hence, we obtain (40) and we can proceed as in Lemma 7.2.

Conclusion
We have investigated an initial-boundary value problem for a fractional diffusion equation with space-dependent variable order where the coefficients are dependent on spatial and time variables. First, we have studied the properties of the governing kernel in Sect. 2. Afterwards, in Sect. 3, we have generalised a fundamental identity for integro-differential operators of the form d dt (k * v)(t) to a convolution kernel that is also space-dependent. By the aid of a convolution quadrature on a uniform mesh, we have proven the existence and uniqueness of a unique weak solution to the problem by aid of Rothe's method in the next sections. Moreover, we discussed in detail the constant-order case and under which assumptions a more regular solution can be obtained. We considered a time-discrete convolution on a graded mesh in order to improve the computational results. We also investigated a composite medium consisting of a finite number of separated subdomains. An interesting direction for future research is to investigate the well-posedness of the fractional wave equation of variable order.