Abstract:We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at <a class="link-external link-https" href="https://github.com/charlesjin/emergent-semantics" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explore whether language models (LMs), when trained only for next - token prediction, can learn and represent the formal semantics of programs. Specifically, the author experimentally studied the following hypothesis through experiments: **Main Hypothesis (MH):** - A language model used only for next - token prediction cannot model the formal semantics of its underlying programming language. To verify this hypothesis, the author conducted the following research: 1. **Experimental Setup**: - Use a synthetic dataset that contains programs for navigating a 2D grid world written in a specific - domain language. - Each program is preceded by a partial specification, that is, multiple input - output grid - world states. - Train a Transformer model to predict the next token and use a small classifier to extract information about the intermediate grid - world states from the LM hidden states. 2. **Research Method**: - Explore whether the LM can gradually acquire the ability to interpret the formal semantics of programs during training. - Develop a new intervention - baseline experiment to distinguish what the LM and the probe have learned, ensuring that the semantics extracted by the probe are indeed represented by the LM, not learned by the probe itself. 3. **Research Results**: - It is found that as training progresses, the probe's ability to extract intermediate states improves significantly and is closely related to the LM's ability to generate correct programs. - It is further proved through intervention experiments that the information in the LM hidden states is not just a syntactic record, but contains an understanding of the program semantics. ### Summary The main contribution of this paper is to provide evidence that although language models are trained only through the next - token prediction task, they can still spontaneously learn the formal semantics of programs. This challenges the previous hypothesis that language models can only generate text based on surface statistical correlations. In addition, the author proposes a new intervention technique that can more accurately distinguish the roles of the LM and the probe in semantic extraction. ### Formula Display Some of the formulas involved in the paper are as follows: - Program state transition formula: \[ (\text{state_prog})_i=\text{exec}(\text{token}_i, (\text{state_prog})_{i - 1}) \] - Abstract interpretation formula: \[ \alpha:\text{state_prog}\to(\text{position}, \text{direction}, \text{obstacle}) \] These formulas show how the program state changes with operations and how to abstract key features from the concrete state.

Emergent Representations of Program Semantics in Language Models Trained on Programs

Towards Understanding What Code Language Models Learned

Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics

How could Neural Networks understand Programs?

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

LMs: Understanding Code Syntax and Semantics for Code Analysis

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Large Language Models are Interpretable Learners

From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

Exploiting Code Symmetries for Learning Program Semantics

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models

Unlocking Emergent Modularity in Large Language Models

An Empirical Study on Capability of Large Language Models in Understanding Code Semantics

Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection

Towards Uncovering How Large Language Model Works: An Explainability Perspective