Abstract:In last few years, applying language model to source code is the state-of-the-art method for solving the problem of code completion. However, compared with natural language, code has more obvious repetition characteristics. For example, a variable can be used many times in the following code. Variables in source code have a high chance to be repetitive. Cloned code and templates, also have the property of token repetition. Capturing the token repetition of source code is important. In different projects, variables or types are usually named differently. This means that a model trained in a finite data set will encounter a lot of unseen variables or types in another data set. How to model the semantics of the unseen data and how to predict the unseen data based on the patterns of token repetition are two challenges in code completion. Hence, in this paper, token repetition is modelled as a graph, we propose a novel REP model which is based on deep neural graph network to learn the code toke repetition. The REP model is to identify the edge connections of a graph to recognize the token repetition. For predicting the token repetition of token [Formula: see text], the information of all the previous tokens needs to be considered. We use memory neural network (MNN) to model the semantics of each distinct token to make the framework of REP model more targeted. The experiments indicate that the REP model performs better than LSTM model. Compared with Attention-Pointer network, we also discover that the attention mechanism does not work in all situations. The proposed REP model could achieve similar or slightly better prediction accuracy compared to Attention-Pointer network and consume less training time. We also find other attention mechanism which could further improve the prediction accuracy.

An Improved Recurrent Neural Network Language Model for Programming Language

Towards Interpreting Recurrent Neural Networks Through Probabilistic Abstraction

A Study on Neural Network Language Modeling

Exploring the Limits of Language Modeling

RECURRENT NEURAL NETWORK BASED LANGUAGE MODELING WITH CONTROLLABLE EXTERNAL MEMORY

Exploring the Naturalness of Buggy Code with Recurrent Neural Networks

Recurrent Memory Networks for Language Modeling

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network Based Language Model Adaptation

Using LSTMs to Model the Java Programming Language

Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition.

Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches

GrammarT5: Grammar-Integrated Pretrained Encoder-Decoder Neural Model for Code

An intelligent error correction model for English grammar with hybrid attention mechanism and RNN algorithm

Research Progress of RNN Language Model

Improve Language Modeling for Code Completion Through Learning General Token Repetition of Source Code with Optimized Memory

A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning.

FPGA Acceleration of Recurrent Neural Network Based Language Model

Revisiting Simple Neural Probabilistic Language Models

Better Language Models of Code through Self-Improvement

Lower Bounds on the Expressivity of Recurrent Neural Language Models