Abstract:It is observed empirically that the large language models (LLM), trained with a variant of regression loss using numerous corpus from the Internet, can unveil causal associations to some extent. This is contrary to the traditional wisdom that “association is not causation” and the paradigm of traditional causal inference in which prior causal knowledge should be carefully incorporated into the design of methods. It is a mystery why causality, in a higher layer of understanding, can emerge from the regression task that pursues associations. In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the heterogeneity of the source data, stochasticity of training algorithms, and over-parameterization of the learning models. We illustrate such an intuition using a simple but insightful model that learns invariance, a quasi-causality, using regression loss. To be specific, we consider multi-environment low-rank matrix sensing problems where the unknown r-rank ground-truth d*d matrices diverge across the environments but contain a lower-rank invariant, causal part. In this case, running pooled gradient descent will result in biased solutions that only learn associations in general. We show that running large-batch Stochastic Gradient Descent, whose each batch being linear measurement samples randomly selected from a certain environment, can successfully drive the solution towards the invariant, causal solution under certain conditions. This step is related to the relatively strong heterogeneity of the environments, the large step size and noises in the optimization algorithm, and the over-parameterization of the model. In summary, we unveil another implicit bias that is a result of the symbiosis between the heterogeneity of data and modern algorithms, which is, to the best of our knowledge, first in the literature.

Implicit Regression: Detecting Constants and Inverse Relationships with Bivariate Random Error

Introduction to Implicit Regression

Regressions with Berkson errors in covariates - A nonparametric approach

Lattice Designs in Standard and Simple Implicit Multi-linear Regression

Indirect multivariate response linear regression

Inference for High-Dimensional Linear Expectile Regression with De-Biasing Method

Bayesian variable selection in linear regression models with instrumental variables

Hidden Variable Discovery Based on Regression and Entropy

On the Ambiguity of Interaction and Nonlinear Main Effects in a Regime of Dependent Covariates

High-Dimensional Linear Regression via Implicit Regularization

Regression from Dependent Observations

Nonlinear Regression with Residuals: Causal Estimation with Time-varying Treatments and Covariates

The Implicit Bias of Heterogeneity towards Invariance and Causality

Minimax Instrumental Variable Regression and L_2 Convergence Guarantees Without Identification or Closedness

Prediction regions through Inverse Regression

Implicit predictors in regularized data-driven predictive control

Inverse Modeling: A Strategy to Cope with Non-linearity

Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection

Understanding Implicit Regularization in Over-Parameterized Single Index Model

ELASTIC-NET REGULARIZED HIGH-DIMENSIONAL NEGATIVE BINOMIAL REGRESSION: CONSISTENCY AND WEAK SIGNAL DETECTION

Regression with an Imputed Dependent Variable