Abstract:We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to explore whether large language models (LLMs) can achieve general computational capabilities during autoregressive decoding. Specifically, the authors attempt to demonstrate whether a language model based on the Transformer architecture can achieve general computation through autoregressive decoding without external intervention or modification of the model weights. ### Main Contributions 1. **General Computational Capability**: The authors demonstrate that by extending the autoregressive decoding method, a large language model can simulate a Universal Turing Machine (UTM), thereby possessing general computational capabilities. 2. **Lag System**: The authors introduce a computational model called the Lag system and prove that the Lag system is computationally universal. Through this model, they show how to map the computational process of a UTM to the autoregressive decoding process of a language model. 3. **Specific Implementation**: The authors developed a specific system prompt that enables the gemini-1.5-pro-001 model to correctly apply 2027 production rules in greedy decoding mode, thereby simulating the behavior of a UTM. ### Method Overview 1. **Extension of Autoregressive Decoding**: The authors propose an extended autoregressive decoding method that allows for handling input sequences of arbitrary length. This method advances the decoding process by appending generated symbols after each context window. 2. **Definition and Properties of the Lag System**: The Lag system is a simple computational model that operates on memory strings through a series of rules. The authors prove that the Lag system can simulate bidirectional memory access, which is key to achieving general computation. 3. **Simulation of UTM**: Through a series of steps, the authors convert the computational process of a specific UTM (U15,2) into the production rules of the Lag system and further map these rules into the decoding process of the language model. ### Conclusion Through the aforementioned methods, the authors successfully demonstrate that the gemini-1.5-pro-001 model can simulate the execution process of U15,2 on any input in an extended autoregressive decoding mode, thereby possessing general computational capabilities. This result not only validates the computational potential of large language models but also provides new perspectives and methods for future research.

Autoregressive Large Language Models are Computationally Universal

Memory Augmented Large Language Models are Computationally Universal

Learning to Decode for Future Success

Auto-Regressive Next-Token Predictors are Universal Learners

Large Language Models and the Extended Church-Turing Thesis

Universal Length Generalization with Turing Programs

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

Dynamic Universal Approximation Theory: The Basic Theory for Transformer-based Large Language Models

Arithmetic with language models: From memorization to computation

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Large Language Models as General Pattern Machines

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

On the Representational Capacity of Recurrent Neural Language Models

Large Language Models as Markov Chains

Memory-Augmenting Decoder-Only Language Models through Encoders (Student Abstract)

A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds?

CLLMs: Consistency Large Language Models

How Powerful are Decoder-Only Transformer Neural Models?

Large Language Models for Mathematicians

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation