Emergent Representations of Program Semantics in Language Models Trained on Programs

Charles Jin,Martin Rinard
2024-08-03
Abstract:We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at <a class="link-external link-https" href="https://github.com/charlesjin/emergent-semantics" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language,Programming Languages
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to explore whether language models (LMs), when trained only for next - token prediction, can learn and represent the formal semantics of programs. Specifically, the author experimentally studied the following hypothesis through experiments: **Main Hypothesis (MH):** - A language model used only for next - token prediction cannot model the formal semantics of its underlying programming language. To verify this hypothesis, the author conducted the following research: 1. **Experimental Setup**: - Use a synthetic dataset that contains programs for navigating a 2D grid world written in a specific - domain language. - Each program is preceded by a partial specification, that is, multiple input - output grid - world states. - Train a Transformer model to predict the next token and use a small classifier to extract information about the intermediate grid - world states from the LM hidden states. 2. **Research Method**: - Explore whether the LM can gradually acquire the ability to interpret the formal semantics of programs during training. - Develop a new intervention - baseline experiment to distinguish what the LM and the probe have learned, ensuring that the semantics extracted by the probe are indeed represented by the LM, not learned by the probe itself. 3. **Research Results**: - It is found that as training progresses, the probe's ability to extract intermediate states improves significantly and is closely related to the LM's ability to generate correct programs. - It is further proved through intervention experiments that the information in the LM hidden states is not just a syntactic record, but contains an understanding of the program semantics. ### Summary The main contribution of this paper is to provide evidence that although language models are trained only through the next - token prediction task, they can still spontaneously learn the formal semantics of programs. This challenges the previous hypothesis that language models can only generate text based on surface statistical correlations. In addition, the author proposes a new intervention technique that can more accurately distinguish the roles of the LM and the probe in semantic extraction. ### Formula Display Some of the formulas involved in the paper are as follows: - Program state transition formula: \[ (\text{state_prog})_i=\text{exec}(\text{token}_i, (\text{state_prog})_{i - 1}) \] - Abstract interpretation formula: \[ \alpha:\text{state_prog}\to(\text{position}, \text{direction}, \text{obstacle}) \] These formulas show how the program state changes with operations and how to abstract key features from the concrete state.