User-assisted code query customization and optimization
Ben Liblit,Yingjun Lyu,Rajdeep Mukherjee,Omer Tripp,Yanjun Wang
DOI: https://doi.org/10.1007/s10009-024-00763-0
2024-08-31
International Journal on Software Tools for Technology Transfer
Abstract:Running static analysis rules in the wild as part of a commercial service demands special consideration of time limits and scalability, given the large and diverse real-world workloads that the rules are evaluated on. Furthermore, these rules do not run in isolation, which exposes opportunities for reuse of partial evaluation results across rules. In our work on Amazon CodeGuru Reviewer, and its underlying rule-authoring toolkit known as the Guru Query Language (GQL), we have encountered performance and scalability challenges, and identified corresponding optimization opportunities, such as caching , indexing , and customization of data-flow specification , which rule authors can take advantage of as built-in GQL constructs. Our experimental evaluation of a dataset of open-source GitHub repositories shows 3× speedup and perfect recall using indexing-based configurations, 2× speedup and 51% increase on the number of findings for caching-based optimization. Customizing the data-flow specification, such as expanding the tracking scope, can yield a remarkable increase in the number of findings, as much as 136%. However, this enhancement comes at the expense of a longer analysis time. Our evaluations emphasize the importance of customizing the data-flow specification, particularly when users operate under time constraints. This customization helps the analysis complete within the given time frame, ultimately leading to improved recall.
computer science, software engineering