Abstract:We study efficiency of non-parametric estimation of diffusions (stochastic differential equations driven by Brownian motion) from long stationary trajectories. First, we introduce estimators based on conditional expectation which is motivated by the definition of drift and diffusion coefficients. These estimators involve time- and space-discretization parameters for computing expected values from discretely-sampled stationary data. Next, we analyze consistency and mean squared error of these estimators depending on computational parameters. We derive relationships between the number of observational points, time- and space-discretization parameters in order to achieve the optimal speed of convergence and minimize computational complexity. We illustrate our approach with numerical simulations.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively perform non - parametric estimation of stochastic differential equations (SDEs) from discretely - sampled stationary data. Specifically, the author focuses on how to select computational parameters (such as the observation time step $\Delta t$, the spatial discretization step $\Delta x$, and the number of observation time instances $M$) to optimize the performance of non - parametric estimation of the drift coefficient and the diffusion coefficient while reducing the computational complexity and the data generation complexity.
### Background and Motivation
In recent years, the amount of available observational data has increased significantly, and fields such as biology, earth science, and social science have provided a large number of data sets that need to be analyzed. In these data sets, it is usually necessary to fit empirical models using the available stationary data in order to predict future values or generate trajectories with similar statistical properties. For example, such requirements are very common in fields such as turbulence, reduced - order modeling of nonlinear dynamics, and biology. Therefore, how to efficiently extract useful information from these data, especially how to estimate the drift coefficient and the diffusion coefficient in stochastic differential equations, has become an important research topic.
### Research Objectives
The main objective of this paper is to explore how to optimally select computational parameters from discretely - sampled stationary data through the method of conditional expectation to perform non - parametric estimation of stochastic differential equations. Compared with parametric methods, non - parametric methods are usually more flexible because they do not depend on the specific functional forms of the drift coefficient and the diffusion coefficient. However, non - parametric methods also face higher computational challenges. Therefore, this paper aims to provide a method that reduces computational complexity while maintaining estimation accuracy.
### Methods and Techniques
1. **Construction of Non - parametric Estimators**:
- The author defines estimators for the drift coefficient and the diffusion coefficient based on conditional expectation.
- These estimators involve the time discretization parameter $\Delta t$ and the spatial discretization parameter $\Delta x$ and are used to calculate expected values from discretely - sampled stationary data.
2. **Analysis of Consistency and Mean - squared Error**:
- The author analyzes the consistency and mean - squared error of these estimators and derives the relationships between the number of observation points, the time discretization parameter, and the spatial discretization parameter to achieve the optimal convergence rate and the minimum computational complexity.
3. **Numerical Simulation**:
- The author verifies the results of the theoretical analysis through numerical simulation and shows the performance of the estimators under different parameter settings.
### Main Results
- **Bias Analysis**:
- The drift estimator $\hat{A}(x_k)$ and the diffusion estimator $\hat{D}^2(x_k)$ have biases under finite $\Delta t$ and $\Delta x$, but the biases will disappear as $\Delta t$ and $\Delta x$ tend to zero.
- The dominant terms of the bias terms are $C(\Delta x^2+\Delta t)$ respectively, where the constant $C$ may depend on $x_k$.
- **Mean - squared Error Analysis**:
- The mean - squared error (MSE) of the drift estimator is $C(\sqrt{\Delta t}+\frac{1}{M\Delta t}+(\Delta x)^2+\Delta x\sqrt{\Delta t})$.
- The mean - squared error of the diffusion estimator is $C(\frac{1}{M}+\Delta x+\Delta t)$.
- In order to make the MSE tend to zero, it is required that $M\Delta t\rightarrow\infty$, $\Delta t\rightarrow0$, $\Delta x\rightarrow0$, and $\Delta x\sqrt{\Delta t}\rightarrow0$.
- **Numerical Simulation Results**:
- The numerical simulation results show that a larger $\Delta x$ has little impact on the estimation accuracy, while a smaller $\Delta t$ is more conducive to improving the estimation accuracy.
- In practical applications, choose $\Delta x\sim\Delta t^{1/2 + \epsilon}$