Naturalness of Attention: Revisiting Attention in Code Language Models

Mootez Saad,Tushar Sharma

2023-11-23

Abstract:Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial interpretability insights by focusing solely on attention weights rather than considering the wider context modeling of Transformers. This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights. We conduct an initial empirical study analyzing both attention distributions and transformed representations in CodeBERT. Across two programming languages, Java and Python, we find that the scaled transformation norms of the input better capture syntactic structure compared to attention weights alone. Our analysis reveals characterization of how CodeBERT embeds syntactic code properties. The findings demonstrate the importance of incorporating factors beyond just attention weights for rigorously understanding neural code models. This lays the groundwork for developing more interpretable models and effective uses of attention mechanisms in program analysis.

Software Engineering,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Current research on the attention mechanism in code language models (such as CodeBERT) mainly focuses on attention weights, while ignoring the influence of input transformation. This research method may not be able to fully understand how the model captures the properties of code. Therefore, this paper aims to re - examine the attention mechanism in code language models. By analyzing the attention weights and the norms of input transformation (i.e., \(\parallel\alpha f(x)\parallel\)), we can more comprehensively understand how these models embed the syntactic structure of source code. Specifically, the main objectives of the paper are: 1. **Inter - layer trend analysis**: Study the overall trend differences between the attention weights \(\alpha\) and the norms of input transformation \(\parallel\alpha f(x)\parallel\) among the layers of CodeBERT. 2. **Syntactic alignment analysis**: Compare the performance of \(\parallel\alpha f(x)\parallel\) and attention weights in terms of the alignment degree of the syntactic structure of source code. Through these analyses, the author hopes to reveal deeper characteristics of the attention mechanism in code representation and provide a basis for developing more interpretable models.

Naturalness of Attention: Revisiting Attention in Code Language Models

An Exploratory Study on Code Attention in BERT

Extracting Meaningful Attention on Source Code: An Empirical Study of Developer and Neural Model Code Exploration

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

Attention in Natural Language Processing

CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking

Interrogating the Explanatory Power of Attention in Neural Machine Translation

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

Follow-up Attention: An Empirical Study of Developer and Neural Model Code Exploration

Attention Interpretability Across NLP Tasks

Understanding Long Programming Languages with Structure-Aware Sparse Attention

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Is model attention aligned with human attention? an empirical study on large language models for code generation

Towards Modeling Human Attention from Eye Movements for Neural Source Code Summarization

A Novel Perspective to Look at Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification

Syntax-based Attention Model for Natural Language Inference.

Rethinking the role of attention mechanism: a causality perspective

Why Attentions May Not Be Interpretable?