Documentation-based Functional Constraint Generation for Library Methods

Renhe Jiang,Zhengzhao Chen,Yu Pei,Minxue Pan,Tian Zhang,Xuandong Li
DOI: https://doi.org/10.1002/stvr.1785
2021-01-01
Abstract:Although software libraries promote code reuse and facilitate software development, they increase the complexity of programme analysis tasks. To effectively analyse programmes built on top of software libraries, it is essential to have specifications for the library methods that can be easily processed by analysis tools. However, the availability of such specifications is seriously limited at the moment. Manually writing the specifications can be prohibitively expensive and error-prone, while existing automated approaches to inferring the specifications seldom produce results that are strong enough to be used in programme analysis. In this work, we propose the DOC2SMT approach to generating strong functional constraints in SMT for library methods based on their documentations. DOC2SMT first applies natural language processing (NLP) techniques and a set of rules to translate a method's natural language documentation into a large number of candidate constraint clauses in OCL. Then, it utilises a manually enhanced domain model to identify OCL candidate constraint clauses that comply with the problem domain in static validation, translates well-formed OCL constraints into the SMT-LIB format, and checks whether each 5MB-LIB constraint rightly abstracts the functionalities of the method under consideration via testing in dynamic validation. In the end, it reports the first functional constraint that survives both validations to the user as the result. We have implemented the approach into a supporting tool with the same name. In experiments conducted on 451 methods from the Java Collections Framework and the Java IO library, DOC2SMT generated correct constraints for 309 methods, with the average generation time for each correct constraint being merely 2.7 min. We have also applied the generated constraints to facilitate symbolic-execution-based test generation with the Symbolic Java PathFinder (SPF) tool. For 24 utility methods manipulating Java container and IO objects, SPF with access to the generated constraints produced 51.2 times more test cases than SPF without the access.
What problem does this paper attempt to address?