DLInfer: Deep Learning with Static Slicing for Python Type Inference.

Yanyan Yan,Yang Feng,Hongcheng Fan,Baowen Xu
DOI: https://doi.org/10.1109/icse48619.2023.00170
2023-01-01
Abstract:Python programming language has gained enormous popularity in the past decades. While its flexibility significantly improves software development productivity, the dynamic typing feature challenges software maintenance and quality assurance. To facilitate programming and type error checking, the Python programming language has provided a type hint mechanism enabling developers to annotate type information for variables. However, this manual annotation process often requires plenty of resources and may introduce errors. In this paper, we propose a deep learning type inference technique, namely DLInfer, to automatically infer the type information for Python programs. DLInfer collects slice statements for variables through static analysis and then vectorizes them with the Unigram Language Model algorithm. Based on the vectorized slicing features, we designed a bi-directional gated recurrent unit model to learn the type propagation information for inference. To validate the effectiveness of DLInfer, we conduct an extensive empirical study on 700 open-source projects. We evaluate its accuracy in inferring three kinds of fundamental types, including built-in, library, and user-defined types. By training with a large-scale dataset, DLInfer achieves an average of 98.79% Top-1 accuracy for the variables that can get type information through static analysis and manual annotation. Further, DLInfer achieves 83.03% type inference accuracy on average for the variables that can only obtain the type information through dynamic analysis. The results indicate DLInfer is highly effective in inferring types. It is promising to apply it to assist in various software engineering tasks for Python programs.
What problem does this paper attempt to address?