Autoregressive Large Language Models are Computationally Universal

Dale Schuurmans,Hanjun Dai,Francesco Zanini
2024-10-04
Abstract:We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.
Computation and Language
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to explore whether large language models (LLMs) can achieve general computational capabilities during autoregressive decoding. Specifically, the authors attempt to demonstrate whether a language model based on the Transformer architecture can achieve general computation through autoregressive decoding without external intervention or modification of the model weights. ### Main Contributions 1. **General Computational Capability**: The authors demonstrate that by extending the autoregressive decoding method, a large language model can simulate a Universal Turing Machine (UTM), thereby possessing general computational capabilities. 2. **Lag System**: The authors introduce a computational model called the Lag system and prove that the Lag system is computationally universal. Through this model, they show how to map the computational process of a UTM to the autoregressive decoding process of a language model. 3. **Specific Implementation**: The authors developed a specific system prompt that enables the gemini-1.5-pro-001 model to correctly apply 2027 production rules in greedy decoding mode, thereby simulating the behavior of a UTM. ### Method Overview 1. **Extension of Autoregressive Decoding**: The authors propose an extended autoregressive decoding method that allows for handling input sequences of arbitrary length. This method advances the decoding process by appending generated symbols after each context window. 2. **Definition and Properties of the Lag System**: The Lag system is a simple computational model that operates on memory strings through a series of rules. The authors prove that the Lag system can simulate bidirectional memory access, which is key to achieving general computation. 3. **Simulation of UTM**: Through a series of steps, the authors convert the computational process of a specific UTM (U15,2) into the production rules of the Lag system and further map these rules into the decoding process of the language model. ### Conclusion Through the aforementioned methods, the authors successfully demonstrate that the gemini-1.5-pro-001 model can simulate the execution process of U15,2 on any input in an extended autoregressive decoding mode, thereby possessing general computational capabilities. This result not only validates the computational potential of large language models but also provides new perspectives and methods for future research.