MLPs Learn In-Context on Regression and Classification Tasks

William L. Tong,Cengiz Pehlevan

2024-09-27

Abstract:In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget in this setting. We further show that MLPs outperform Transformers on a series of classical tasks from psychology designed to test relational reasoning, which are closely related to in-context classification. These results underscore a need for studying in-context learning beyond attention-based architectures, while also challenging strong prior arguments about MLPs' limited ability to solve relational tasks. Altogether, our results highlight the unexpected competence of MLPs, and support the growing interest in all-MLP alternatives to task-specific architectures.

Machine Learning,Neural and Evolutionary Computing

What problem does this paper attempt to address?

The paper primarily explores the performance of Multilayer Perceptrons (MLPs) in In-context Learning (ICL) tasks and attempts to address the following core issues: 1. **Demonstrating the capability of MLPs in ICL tasks**: The study finds that MLPs not only perform excellently in in-context learning tasks but also are competitive with Transformer models given the same computational resources. This indicates that ICL is not limited to attention-based architectures. 2. **Challenging traditional views on MLPs' ability to handle relational tasks**: The paper shows that MLPs outperform Transformer models in a series of classic psychological tasks designed to test relational reasoning abilities, further proving MLPs' capability in solving relational tasks. 3. **Exploring performance differences of different architectures in ICL tasks**: By comparing the performance of MLPs, MLP-Mixers, and Transformers in regression and classification ICL tasks, the paper reveals the strengths and weaknesses of different architectures under various conditions, and points out that MLPs even surpass Transformer models under certain specific conditions. In summary, the paper attempts to reassess the potential of MLPs in in-context learning tasks and challenges previous views on the limitations of MLPs, providing new perspectives for future research.

MLPs Learn In-Context on Regression and Classification Tasks

Context-Scaling versus Task-Scaling in In-Context Learning

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Does learning the right latent variables necessarily improve in-context learning?

Transformers are Minimax Optimal Nonparametric In-Context Learners

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

Asymptotic theory of in-context learning by linear attention

Transformer In-Context Learning for Categorical Data

On the Role of Depth and Looping for In-Context Learning with Task Diversity

In-Context In-Context Learning with Transformer Neural Processes

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

In-context Learning in Presence of Spurious Correlations

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Pretrained transformer efficiently learns low-dimensional target functions in-context

Transformers learn variable-order Markov chains in-context

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

In-Context Language Learning: Architectures and Algorithms