Abstract:Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.

Rethinking Positional Encoding in Tree Transformer for Code Representation.

Integrating Tree Path in Transformer for Code Representation

TreeCoders: Trees of Transformers

Algebraic Positional Encodings

Learning Program Representations with a Tree-Structured Transformer

Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit

Rethinking Structural Encodings: Adaptive Graph Transformer for Node Classification Task.

A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities

Rethinking Positional Encoding in Language Pre-training

Comparing Graph Transformers via Positional Encodings

On Tree-Based Neural Sentence Modeling

A Simple and Effective Positional Encoding for Transformers

Improving Transformers using Faithful Positional Encoding

An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization.

AST-trans

Sneaking Syntax into Transformer Language Models with Tree Regularization

Integrating Non-Fourier and AST-Structural Relative Position Representations Into Transformer-Based Model for Source Code Summarization

AST-Trans: Code Summarization with Efficient Tree-Structured Attention

Reach the Remote Neighbors: Dual-Encoding Transformer for Graphs

An Augmented Transformer Architecture for Natural Language Generation Tasks

Structural and positional ensembled encoding for Graph Transformer