Error estimation using neural network technique for solving ordinary differential equations

In this paper, we present a numerical method to solve ordinary differential equations (ODEs) by using neural network techniques in a deferred correction method framework. Similar to the deferred or error correction techniques, a provisional solution of the ODE is preferentially calculated by any lower-order scheme to satisfy given initial conditions, and the corresponding error is investigated by fully connected neural networks and structured to obtain sufficient magnitude of the error. Numerical examples are illustrated to demonstrate the efficiency of the proposed scheme.


Introduction
Solving ordinary differential equations (ODEs) has been paid lots of attention by many scientists and mathematicians due to its importance in various fields of sciences and engineering. For this reason, several numerical techniques for solving ODEs have been developed during last few decades. Also, the numerical methods can be broadly classified into the following categories: the first class consists of one-step multistage techniques such as Runge-Kutta-type methods [5,13,17], the second includes BDF-type multistep methods [6], and the last is a group of deferred or error correction methods [4,7,18,19] such as spectral deferred correction (SDC) methods [8,11], etc.
In particular, in [15], Krylov deferred correction method (KDC), one of deferred correction methods, has been introduced for getting more accurate and higher-order solutions of various differential equations, in which the numerical solution and the corresponding error at each integration step are calculated at the same time, so that the final algorithm can control the error and have good properties such as higher convergence order, better stability, and higher accuracy, etc., compared with the existing numerical techniques.
Apart from this way, with the development of artificial intelligence and computer technology, many researchers have recently paid tremendous attention to develop neural network techniques. Neural networks have been broadly used in many research fields such as pattern recognition [23,24], speech recognition [10,16], image processing [12,27,30], forecasting [2,3], classification [20,26], etc. For this reason, lots of neural network methods are currently developed and widely used. Based on the advantages of such neural network techniques, there are some attempts to use the neural network techniques for solving various mathematical problems such as differential equations. Researchers use multilayer perceptron neural network [21,31], radial basis function neural network [21,25], finite element neural network [28], wavelet neural network [33], etc.
Based on these developments of numerical techniques to resolve mathematical problems, in this study, we especially focus on deferred correction schemes to solve ODEs. Usually, the existing numerical schemes are searching for the solutions of given differential systems. Unlike the traditional numerical schemes, the deferred or error correction schemes are investigated for the numerical errors with a provisional solution which is preferentially calculated by any numerical scheme. It is already shown that these schemes can have higher convergence order and higher accuracy without any loss of stability [4,7,18,19,29].
In addition to the deferred correction schemes, we consider the neural network techniques to solve the ODEs. Recently, several researchers have attempted to solve various differential equations by using the neural network techniques. For example, in [22], a trial solution of differential equations with initial and boundary values is written as a sum of two parts, one is represented as a function which can manage a given initial or boundary conditions and the other consists of a feedforward neural network which is independent of the initial and boundary conditions. In [32], a Legendre neural network method for ODEs is presented by representing a trial solution by Legendre network, in which a Legendre polynomial is chosen as a basis function of hidden neurons and a single hidden layer Legendre neural network is used to eliminate the hidden layer by expanding the input pattern using Legendre polynomials. Here, an improved extreme learning machine algorithm was used for training network weights.
The main objective of this paper is to develop a new algorithm to solve ODEs by using neural network techniques to estimate the numerical error with a calculated provisional solution. First of all, we begin by using a lower-order numerical scheme such as the firstorder Euler method or the second-order midpoint method for the provisional solution. Note that for getting much higher accuracy, we may employ any elaborate higher-order numerical scheme but it may cause enormous computational costs for estimating the provisional solutions. Since neural network techniques also require a certain amount of computational costs, the usage of higher-order methods is meaningless in the proposed algorithm. Once the provisional solution is obtained, the corresponding error is estimated by a full connected neural network. In particular, we set up the corresponding error according to the convergence order of the numerical schemes for the provisional solutions to obtain sufficient magnitudes of the corresponding error. For an assessment of the effectiveness of the proposed algorithm, several experiments are simulated, and, especially, a harmonic oscillator problem is solved to examine the effectiveness of the Hamiltonian property. Throughout these numerical tests, we show that the proposed method works very well and has good properties.
This paper is organized as follows. It starts with some explanations of the basic knowledge that lead to the proposed scheme in Sect. 2.1. In Sect. 2.2, we present our scheme by using neural network systems, in which a provisional solution of the given system is roughly calculated by a lower-order numerical scheme, after that the corresponding error is estimated by traditional fully neural network techniques. In Sect. 3, several numerical results are presented to examine the effectiveness and efficiency of the proposed scheme. Finally, in Sect. 4, we summarize our results and discuss several possibilities to increase the efficiency of the proposed scheme.

Preliminaries
In this subsection, we briefly explain the basics needed for the proposed scheme to solve a general ODE system described by with the initial condition y(0) = y 0 in a given time interval [t 0 , t f ]. Here, t 0 is the initial time and t f is the final time. Also, we assume that the solutions of the given problem in Eq. (1) are continuous. Usually, in traditional numerical methods, for getting numerical solutions of the problem in Eq. (1), the given time interval is discretized into several subintervals. With the initial conditions, the solution in the first subinterval is numerically calculated and that solution will be an initial condition of the next subinterval. This process is sequentially continued, and the final solution can eventually be obtained at the final time t f .
On the other hand, unlike the traditional numerical schemes to solve ODEs, the neural network schemes for solving ODEs have different structure. Most traditional numerical schemes usually have a sequential process to march from an initial time to a final time point, whereas neural network schemes simultaneously seek solutions at all time points. The solution used in neural network techniques can be represented as where t 0 is an initial time point, t f is the final time point, and N(t, w) is a feedforward neural network with parameters w and an input vector t. The first term A(t) usually represents given initial or boundary conditions. The second term G(t, N(t, w)) is constructed so as not to contribute to the initial or boundary conditions, since it must satisfy them in the first part. This term employs a neural network whose weights w are to be adjusted in order to deal with the minimization problem. Note that the problem has been reduced from the original constrained optimization problem to an unconstrained one due to the form of the trial solution that satisfies by construction of the initial or boundary conditions. Once the numerical solution y is set up, a cost function G(w) with the weights w of the neural network is defined as where y is defined in Eq. (2) and F is defined in Eq. (1). Based on the cost function defined above, the weights w should be found by one of various optimization techniques. The most basic technique is the gradient descent algorithm which is an iterative minimization technique for finding a local minimum of the given cost function. The algorithm has the following two steps and processes repeatedly: • The gradient is calculated by the first-order derivative of the cost function G(w) at a point.
• The algorithm moves in the opposite direction of the gradient Note that to find a local minimum of a function based on the gradient descent, we must take steps proportional to the negative of the gradient (move away from the previous point) of the cost function at the current point. Also, γ is a learning rate which is a tuning parameter in the minimization process and decides the length of the steps. It means that if the learning rate is too high, we might overshoot a local minimum and keep bouncing, without reaching the desired minimum, whereas if the learning rate is too small, the training takes too much time, so the computational cost gets too large. However, most cost functions usually contain several local minima. The gradient may reach any such minimum, which depends on both the initial point and learning rate. Due to this reason, the gradient descent optimization technique may converge to different points whenever executing with different initial points and learning rate, which is a weakness of the gradient descent technique.

Method description
The main objective of this subsection is to introduce the propose scheme using the neural network based on deferred correction framework. Note that the neural network used in this work is a simple fully connected neural network in order to exclude the efficiency or reliability of neural networks and focus only on the effectiveness of the proposed method.
Basically, we focus on the calculation of the numerical error unlike the traditional numerical methods which directly estimate solutions of given equations. That is, instead of solving for y(t) in Eq. (1), a provisional solutionŷ(t) is first obtained by using any lowerorder method or initial conditions and then the corresponding error E(t) is defined by y(t) -ŷ(t). In a similar way to deferred or error correction techniques, E(t) can be estimated by any neural network algorithm.
Here, we simply try to use the first-order numerical scheme as a provisional solution, such as Euler method. Recall that at the ith time point, Euler method can be described aŝ where h i = t it i-1 andŷ(t 0 ) = y 0 . Based on the provisional solutionŷ calculated above, we cast a neural network technique to estimate the error function E(t) = y(t) -ŷ(t). As explained in Eq. (2), the estimated solution y(t) can be represented as where G(·, ·) is an appropriate function of t and N(t, w) for estimating the corresponding error term and N(t, w) is a single-output feedforward neural network with parameters w and n-input units fed with the input time vector t. The first termŷ(t) is a provisional solution calculated in Eq. (5). Sinceŷ(t) is the first-order solution, the error term G (t, N(t, w)) should have second-order magnitude.  (1)) over desired time points or time intervals Initialize basic parameters for the neural network (learning rate γ , weights w, tolerance tol) Perform a lower-order scheme to calculate a provisional solutionŷ(t) with the (p -1)th order method or an initial condition Define the error On the other hand, near the initial point t = 0, the Taylor expansion of the y(t) can be represented as follows: where t ∈ [0, 1). By the first-order Euler scheme, the first two terms in Eq. (7) can be estimated, so the error should contain terms from the t 2 term. Therefore, the error function G(t, N(t, w)) in Eq. (6) comprises the t 2 term and beyond. As t ∈ [0, 1), the t 2 term is dominant, so we can define G(t, N(t, w)) as t 2 N(t, w). Eventually, the final form of the desired solution in Eq. (6) can be summarized as whereŷ(t) is the first-order estimate. Based on the whole discussion above and background, we get neural network algorithm to solve ODEs (see Algorithm 1).

Experiments
In this section, we test several examples to examine the effectiveness of the proposed scheme and compare the results with own exact solutions. Note that there are several minimization algorithms used in the neural network techniques. As mentioned above, we concentrate only on the efficiency of the proposed algorithm for solving ODE systems, without any aid of effects caused by other techniques such as the choice of neural networks, or minimization schemes, etc. Therefore, for these experiments, we simply use the basic gradient descent method as a minimization tool with the fixed learning rate γ = 0.001. The details of each problem will be explained in each subsection.
All numerical results are obtained using Python 2.1.5, on a computer with 11th Gen INTEL Core I7-1165G7 CPU, 16 GB memory, and WIN10 operating system. All computational codes including neural networks and minimization schemes are implemented in Python by ourselves without any usage of libraries or packages. Also, since a simple neural network is used for this work, only 3 layers are used.

Example 1
For the first example, we test the simplest form of the ODE described by with the initial condition y(0) = 1. The time interval is [0, 1], which is uniformly discretized into 10 subintervals. That is, 11 node points are used in the input and middle layers of the neural network. The analytic solution is y(t) = exp(λt). Note that to sustain the numerical stability, λ in (9) should be negative [1,9,14]. For this test, we simply set λ = -1.
Since a provisional solutionŷ is estimated by the first-order explicit Euler method, the corresponding error has the second-order O(h 2 ) magnitude, so the neural network term can be set up to be t 2 N(t, w). The network was trained on the 10 subinterval points in [0, 1]. All numerical results are plotted in Fig. 1(a) and compared with the analytic and provisional solutions. It can be seen that the proposed scheme is closer to the analytic solution, so we can conclude that the proposed scheme produces quite reasonable results.
For further examination of the accuracy, we check the difference between the analytic and proposed solutions and plot it in Fig. 1(b). To precisely compare the difference with the provisional solution, the L 2 -norm is measured. The residual between the analytic solution and proposed scheme and between the analytic one and the provisional solution are 0.05277 and 0.15504, respectively. Summing up these results, we can easily see that the proposed scheme works well and has a good accuracy for this problem.  Additionally, to investigate the effect of the stiffness in neural network techniques, we apply the propose scheme to a problem with a stiff component, so λ is set to -10 in problem (9) and is required to be mildly stiff. Since it becomes stiff, the corresponding provisional solution should be obtained from an implicit method as long as the same step size is persistently used in the nonstiff case. Hence, the provisional solution at the ith time integration point can be obtained by the first-order implicit Euler method described bŷ where h is a step size andŷ(t 0 ) = 100. The analytic solution is represented as y(t) = exp(-10t). With the provisional solution having second-order magnitude, the solution is taken as y(t) =ŷ(t) + t 2 N(t, w). Other conditions for the neural network are the same as above. Figure 2(a) displays the solution behaviors of proposed schemes and comparisons with the analytic solution. Note that due to the stiff component, the solution improved rapidly. Here, the figure is plotted in the log-scale to observe the magnitude of the solution. It can show that the results from the proposed scheme are closer to the analytic solution. To precisely check the residual, we plot the error between the results from the proposed scheme and the analytic solution in Fig. 2(b). It shows that the proposed scheme works well even for the stiff problem. However, the results are not perfectly satisfactory and we need to consider other possibilities to improve the results for stiff problems. Actually, there are lots of components to control in the neural network techniques, such as several choices of the minimization technique, free parameters in each minimization, the number of the free parameters, etc.

Example 2
Next, we consider the following ODE: is y(t) = exp(-t 2 /2) 1+t+t 3 + t 2 . Similarly as above, a provisional solutionŷ is estimated by the first-order explicit midpoint method, so the corresponding error has second-order O(h 2 ) magnitude and the error is set to t 2 N(t, w). We plot the results in Fig. 3(a) and compare them with the analytic and provisional solutions. We easily check that the proposed scheme is quite close to the analytic solution. To examine the residual between the analytic solution and the results above, we plot each residual in Fig. 3(b). It can be concluded that the proposed scheme works well and has a good accuracy for this problem. Also, we check the computational time for the proposed scheme. Notice that for increasing the reliability of computational time, we take the average by executing this code 100 times. It needs 6.2 seconds CPU time and 617 iterations for minimization of the parameters w.
Next, to investigate the effect of the convergence magnitude in the proposed neural network technique, we try to use higher-order provisional solutions. First, a provisional solutionŷ is estimated by the second-order explicit midpoint method as described bŷ where h is a step size andŷ(t 0 ) = y 0 . Therefore, the corresponding error is O(h 3 ), so the neural solution can be defined as t 3 N(t, w). The results concerning the accuracy at grid points are presented in Fig. 4(a) with comparisons of the analytic and provisional solutions. It can be seen that the proposed scheme is closer to the analytic solution.
Since a higher-order solution is much closer to the analytic solution, we plot the residual between the analytic solution and the proposed one in Fig. 4(b) to inspect in details the accuracy of the proposed scheme. One can see that the proposed higher-order scheme overall has a quite smaller error compared with the second-order convergence method, although there are a few parts which have a larger error.

Example 3
In this subsection, the following differential equation is considered: with the initial condition y(0) = 0. The exact solutions is y(t) = exp(-t/5) sin(t). For the experiments, we uniformly use 10 node points from [0, 2] in the input and middle layers.
As done above, we estimate a provisional solutionŷ by the first-order explicit Euler method, so the corresponding error can be O(h 2 ). The trial solution is formed to be y(t) =ŷ + t 2 N(t, w). With the same setting for the neural network scheme, we generate all numerical results and plot them in Fig. 5(a). Also the results are compared with the analytic and provisional solutions. It can be seen that the proposed scheme is closer to the analytic solution. To take a closer look at the accuracy, we calculate the difference between the analytic solution and the proposed one and plot it in Fig. 5(b). The L 2 -norm of the difference between the results from the proposed scheme and the analytic solution Also, we check the error accuracy versus computational time of the proposed scheme. Since the computational time depends on the iteration times in the minimization scheme, we measure computational times and error accuracy by varying the iteration times. For the 100 iteration times, the accuracy is 0.17968 and it needs 1.1 seconds, and for the 2000 iterations, the accuracy is 0.02997 and the computational time is 10.5 seconds. Therefore, one can easily conclude that the more iterations we use, the more accurate results are generated. To support this argument, we plot error behavior according to the iterations in Fig. 6.

Example 4
As the last example, we consider a simple Hamiltonian system known as a harmonic oscillator: with the initial condition y(0) = [0, 1]. The exact solutions is [y 1 (t), y 2 (t)] = [sin t, cos t]. For the experiments, we uniformly use 10 node points from [0, 2] in the input and middle layers. Since the given system is Hamiltonian, a provisional solutionŷ is estimated by the firstorder implicit Euler method described in Eq. (10), due to its stability. The trial solutions are set to y 1 (t) =ŷ 1 + t 2 N 1 (t, w) and y 2 (t) =ŷ 2 + t 2 N 2 (t, w). We simulate this vector system and generate the numerical result as seen in Fig. 7. The results are compared with the analytic and provisional solutions. It can be seen that both numerical solutions of the system have a quite good accuracy.
Additionally, a Hamiltonian system is a dynamical system described by the scalar function H, called the Hamiltonian. In this problem, the Hamiltonian H can be defined as and the value of H should be conserved. To examine the conservation property, we plot an orbit of solutions in Fig. 8(a) and the Hamiltonian H in Fig. 8(b). Figure 8(a) shows that the proposed scheme is closer to the analytic solution. Also, one can verify in Fig. 8(b) that the implicit Euler method reduces the total energy of H, whereas results obtained from the proposed scheme try to conserve the energy after a certain moment.

Discussion
In this paper, we introduce a new variation of the neural network techniques to solve ordinary differential equations. Unlike the traditional techniques which directly estimate solutions, the proposed neural network scheme estimates the corresponding error based on the calculated provisional solution by lower-order numerical schemes. Also, the proposed scheme is designed with consideration to estimate sufficient magnitudes of the corresponding error according to the convergence order of the lower-order numerical scheme used for the provisional solutions. Several numerical results show that the proposed scheme can get better accuracy, compared with existing techniques. In order to improve the efficiency of the proposed scheme, we should consider several issues. The first is to optimize the several parameters and choices of optimization which can be controlled in neural networks. The second is to investigate the strategies for stiff problems as seen in Example 1 or to design a new neural network algorithm for Hamiltonian systems, such as symplectic neural networks to keep the energy of the Hamiltonian. Lastly, for more accurate results, we need to employ higher-order provisional solutions. Results along these directions will be reported soon.

Funding
The first author Nam was supported by the basic science research program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (grant number NRF-2019R1I1A3A01059010) and the second author Baek and the corresponding author Bu were supported by the basic science research program through the National Research Foundation of Korea (NRF) funded by the Korea government (MSIT) (grant number NRF-2022R1A2C1004588).

Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.