The Implicit Bias of Heterogeneity towards Invariance and Causality
Yang Xu,Yihong Gu,Cong Fang
2024-01-01
Abstract:It is observed empirically that the large language models (LLM), trained with
a variant of regression loss using numerous corpus from the Internet, can
unveil causal associations to some extent. This is contrary to the traditional
wisdom that “association is not causation” and the paradigm of traditional
causal inference in which prior causal knowledge should be carefully
incorporated into the design of methods. It is a mystery why causality, in a
higher layer of understanding, can emerge from the regression task that pursues
associations. In this paper, we claim the emergence of causality from
association-oriented training can be attributed to the coupling effects from
the heterogeneity of the source data, stochasticity of training algorithms, and
over-parameterization of the learning models. We illustrate such an intuition
using a simple but insightful model that learns invariance, a quasi-causality,
using regression loss. To be specific, we consider multi-environment low-rank
matrix sensing problems where the unknown r-rank ground-truth d*d matrices
diverge across the environments but contain a lower-rank invariant, causal
part. In this case, running pooled gradient descent will result in biased
solutions that only learn associations in general. We show that running
large-batch Stochastic Gradient Descent, whose each batch being linear
measurement samples randomly selected from a certain environment, can
successfully drive the solution towards the invariant, causal solution under
certain conditions. This step is related to the relatively strong heterogeneity
of the environments, the large step size and noises in the optimization
algorithm, and the over-parameterization of the model. In summary, we unveil
another implicit bias that is a result of the symbiosis between the
heterogeneity of data and modern algorithms, which is, to the best of our
knowledge, first in the literature.