MLPs Learn In-Context on Regression and Classification Tasks

William L. Tong,Cengiz Pehlevan
2024-09-27
Abstract:In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget in this setting. We further show that MLPs outperform Transformers on a series of classical tasks from psychology designed to test relational reasoning, which are closely related to in-context classification. These results underscore a need for studying in-context learning beyond attention-based architectures, while also challenging strong prior arguments about MLPs' limited ability to solve relational tasks. Altogether, our results highlight the unexpected competence of MLPs, and support the growing interest in all-MLP alternatives to task-specific architectures.
Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The paper primarily explores the performance of Multilayer Perceptrons (MLPs) in In-context Learning (ICL) tasks and attempts to address the following core issues: 1. **Demonstrating the capability of MLPs in ICL tasks**: The study finds that MLPs not only perform excellently in in-context learning tasks but also are competitive with Transformer models given the same computational resources. This indicates that ICL is not limited to attention-based architectures. 2. **Challenging traditional views on MLPs' ability to handle relational tasks**: The paper shows that MLPs outperform Transformer models in a series of classic psychological tasks designed to test relational reasoning abilities, further proving MLPs' capability in solving relational tasks. 3. **Exploring performance differences of different architectures in ICL tasks**: By comparing the performance of MLPs, MLP-Mixers, and Transformers in regression and classification ICL tasks, the paper reveals the strengths and weaknesses of different architectures under various conditions, and points out that MLPs even surpass Transformer models under certain specific conditions. In summary, the paper attempts to reassess the potential of MLPs in in-context learning tasks and challenges previous views on the limitations of MLPs, providing new perspectives for future research.