A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Neil K. Chada,Quanjun Lang,Fei Lu,Xiong Wang
2024-10-18
Abstract:Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.
Machine Learning,Computation
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the ill - posed inverse problem encountered when learning kernels in operators from data. Specifically: 1. **Ill - posedness Caused by Non - local Dependence**: Since the kernel function effectively represents non - local dependence relationships, this makes learning the kernel function from data an ill - posed inverse problem. This ill - posedness is usually manifested as the instability of the data - related regularization operator. 2. **Limitations of Traditional Bayesian Methods**: Traditional Bayesian methods deal with ill - posedness by using non - degenerate priors, but this method may lead to instability of the posterior mean under small - noise conditions, especially when the data causes perturbations in the null space of the regularization operator. 3. **Proposal of Data - Adaptive Priors**: To overcome the above problems, the paper proposes a new data - adaptive Reproducing Kernel Hilbert Space (RKHS) prior. This prior can ensure the stability of the posterior mean under small - noise conditions and shows better performance than fixed non - degenerate priors in numerical experiments. ### Specific Problem Description - **Problem Background**: - The kernel function effectively represents non - local dependence relationships and is widely used when designing operators between function spaces. - Learning the kernel function in an operator is a linear inverse problem, but due to non - local dependence and various perturbations (such as data noise, numerical errors or model errors), this problem is usually severely ill - posed. - **Deficiencies of Traditional Methods**: - Traditional Bayesian methods use non - degenerate priors to deal with ill - posedness, but this method may lead to instability of the posterior mean under small - noise conditions. - Fixed non - degenerate priors perform poorly when facing perturbations in the null space of the regularization operator caused by data. - **Solution Proposed in the Paper**: - A new data - adaptive RKHS prior is proposed to ensure the stability of the posterior mean under small - noise conditions. - The effectiveness of this prior is verified through analysis and numerical experiments, especially in the learning of discrete and continuous kernels. ### Mathematical Representation - **Loss Function**: \[ E(\phi)=\frac{1}{N\sigma^2_{\eta}}\sum_{k = 1}^N\|R_{\phi}[u_k]-f_k\|^2_Y=\frac{1}{2\sigma^2_{\eta}}\left[\langle L_G\phi,\phi\rangle_{L^2_{\rho}}-2\langle\phi_D,\phi\rangle_{L^2_{\rho}}+C_f\right] \] - **Posterior Mean**: - Posterior mean using the fixed non - degenerate prior \(N(0,Q_0)\): \[ \mu_1=(L_G+\sigma^2_{\eta}Q_0)^{-1}\phi_D \] - Posterior mean using the data - adaptive RKHS prior \(N(0,\lambda^{-1}_*L_G)\): \[ \mu_{D1}=(L_G^2+\sigma^2_{\eta}\lambda_*I_{\text{Null}(L_G)^{\perp}})^{-1}L_G\phi_D \] ### Conclusion By introducing the data - adaptive RKHS prior, the paper solves the problem of the stability of the posterior mean under small - noise conditions and verifies its effectiveness in learning discrete and continuous kernels through numerical experiments. This method provides a new idea for dealing with ill - posed inverse problems.