Abstract:Differentiable Programming for scientific machine learning (SciML) has recently seen considerable interest and success, as it directly embeds neural networks inside PDEs, often called as NeuralPDEs, derived from first principle physics. Therefore, there is a widespread assumption in the community that NeuralPDEs are more trustworthy and generalizable than black box models. However, like any SciML model, differentiable programming relies predominantly on high-quality PDE simulations as "ground truth" for training. However, mathematics dictates that these are only discrete numerical approximations of the true physics. Therefore, we ask: Are NeuralPDEs and differentiable programming models trained on PDE simulations as physically interpretable as we think? In this work, we rigorously attempt to answer these questions, using established ideas from numerical analysis, experiments, and analysis of model Jacobians. Our study shows that NeuralPDEs learn the artifacts in the simulation training data arising from the discretized Taylor Series truncation error of the spatial derivatives. Additionally, NeuralPDE models are systematically biased, and their generalization capability is likely enabled by a fortuitous interplay of numerical dissipation and truncation error in the training dataset and NeuralPDE, which seldom happens in practical applications. This bias manifests aggressively even in relatively accessible 1-D equations, raising concerns about the veracity of differentiable programming on complex, high-dimensional, real-world PDEs, and in dataset integrity of foundation models. Further, we observe that the initial condition constrains the truncation error in initial-value problems in PDEs, thereby exerting limitations to extrapolation. Finally, we demonstrate that an eigenanalysis of model weights can indicate a priori if the model will be inaccurate for out-of-distribution testing.

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Learning effective stochastic differential equations from microscopic simulations: Linking stochastic numerics to deep learning.

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

Learning effective stochastic differential equations from microscopic simulations: combining stochastic numerics and deep learning

Dynamics of Local Elasticity During Training of Neural Nets

Understanding Short-Range Memory Effects in Deep Neural Networks

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics

Multi-scale Feature Learning Dynamics: Insights for Double Descent

On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages

Training Dynamics of Deep Network Linear Regions

Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots

Learning time-scales in two-layers neural networks

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations

What You See is Not What You Get: Neural Partial Differential Equations and The Illusion of Learning

Learning stochastic dynamical systems with neural networks mimicking the Euler-Maruyama scheme

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

Machine learning in and out of equilibrium

Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width