Learning Embeddings of API Tokens to Facilitate Deep Learning Based Program Processing

Yangyang Lu,Ge Li,Rui Miao,Zhi Jin
DOI: https://doi.org/10.1007/978-3-319-47650-6_42
2016-01-01
Abstract:Deep learning has been applied for processing programs in recent years and gains extensive attention on the academic and industrial communities. In analogous to process natural language data based on word embeddings, embeddings of tokens (e.g. classes, variables, methods etc.) provide an important basis for processing programs with deep learning. Nowadays, lots of real-world programs rely on API libraries for implementation. They contain numbers of API tokens (e.g. API related classes, interfaces, methods etc.), which indicate notable semantics of programs. However, learning embeddings of API tokens is not exploited yet. In this paper, we propose a neural model to learn embeddings of API tokens. Our model combines a recurrent neural network with a convolutional neural network. And we use API documents as training corpus. Our model is trained on documents of five popular API libraries and evaluated on a description selecting task. To our best knowledge, this paper is the first to learn embeddings of API tokens and takes a meaningful step to facilitate deep learning based program processing.
What problem does this paper attempt to address?