Abstract:As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the meanings of variables in program is not always easy for developers, since poor-quality variable names are prevalent while such variable are less informative for program comprehension. Therefore, in this paper, we target at generating concise natural language explanations for variables to facilitate program comprehension. In particular, there are two challenges in variable explanation generation, including the lack of training data and the association with complex code contexts around the variable. To address these issues, we propose a novel approach ZeroVar,which leverages code pre-trained models and zero-shot prompt learning to generate explanations for the variable based on its code context. ZeroVarcontains two stages: (i) a pre-training stage that continually pre-trains a base model (i.e., CodeT5) to recover the randomly-masked parameter descriptions in method docstrings; and (ii) a zero-shot prompt learning stage that leverages the pre-trained model to generate explanations for a given variable via the prompt constructed with the variable and its belonging method context. We then extensively evaluate the quality and usefulness of the variable explanations generated by ZeroVar.We construct an evaluation dataset of 773 variables and their reference explanations. Our results show that ZeroVarcan generate higher-quality explanations than baselines, not only on automated metrics such as BLEU and ROUGE, but also on human metrics such as correctness, completeness, and conciseness. Moreover, we further assess the usefulness of ZeroVAR-generated explanations on two downstream tasks related to variable naming quality, i.e., abbreviation expansion and spelling correction. For abbreviation expansion, the generated variable explanations can help improve the present rate (+13.1%), precision (+3.6%), and recall (+10.0%) of the state-of-the-art abbreviation explanation approach. For spelling correction, by using the generated explanations we can achieve higher hit@1 (+162.9(%) and hit@3 (+49.6%) than the recent variable representation learning approach.

VarGAN: Adversarial Learning of Variable Semantic Representations

Versatile Auxiliary Regressor with Generative Adversarial network (VAR+GAN)

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation

Feature Augmentation for Adversarial Robustness

CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training

Generating Variable Explanations Via Zero-shot Prompt Learning.

Class Balancing GAN with a Classifier in the Loop

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

When Molecular GAN Meets Byte-Pair Encoding

VARGAN: Variance Enforcing Network Enhanced GAN

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Symbol Preference Aware Generative Models for Recovering Variable Names from Stripped Binary

Feature Variance Regularization: A Simple Way to Improve the Generalizability of Neural Networks

Learning to Represent Programs with Graphs

On the Anomalous Generalization of GANs

StyleGenes: Discrete and Efficient Latent Distributions for GANs

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy

Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and Model

Multilinear Latent Conditioning for Generating Unseen Attribute Combinations