C2S: Translating Natural Language Comments to Formal Program Specifications

Juan Zhai,Yu Shi,Minxue Pan,Guian Zhou,Yongxiang Liu,Chunrong Fang,Shiqing Ma,Lin Tan,Xiangyu Zhang
DOI: https://doi.org/10.1145/3368089.3409716
2020-01-01
Abstract:Formal program specifications are essential for various software engineering tasks, such as program verification, program synthesis, code debugging and software testing. However, manually inferring formal program specifications is not only time-consuming but also error-prone. In addition, it requires substantial expertise. Natural language comments contain rich semantics about behaviors of code, making it feasible to infer program specifications from comments. Inspired by this, we develop a tool, named C2S, to automate the specification synthesis task by translating natural language comments into formal program specifications. Our approach firstly constructs alignments between natural language word and specification tokens from existing comments and their corresponding specifications. Then for a given method comment, our approach assembles tokens that are associated with words in the comment from the alignments into specifications guided by specification syntax and the context of the target method. Our tool successfully synthesizes 1,145 specifications for 511 methods of 64 classes in 5 different projects, substantially outperforming the state-of-the-art. The generated specifications are also used to improve a number of software engineering tasks like static taint analysis, which demonstrates the high quality of the specifications.
What problem does this paper attempt to address?