Customizing Static Analysis using Codesearch

Avi Hayoun,Veselin Raychev,Jack Hair
2024-04-19
Abstract:Static analysis is a growing application of software engineering, leading to a range of essential security tools, bug-finding tools, as well as software verification. Recent years show an increase of universal static analysis tools that validate a range of properties and allow customizing parts of the scanner to validate additional properties or "static analysis rules". A commonly used language to describe a range of static analysis applications is Datalog. Unfortunately, the language is still non-trivial to use, leading to analysis that is difficult to implement in a precise but performant way. In this work, we aim to make building custom static analysis tools much easier for developers, while at the same time, providing a familiar framework for application security and static analysis experts. Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime by the means of having low time complexity of its decision procedure.
Programming Languages,Logic in Computer Science,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simplify the construction process of custom - made static analysis tools, while providing a familiar framework for application security and static analysis experts. Specifically, the authors find that although existing static analysis tools are powerful, the languages they use (such as Datalog) are still too complex for non - expert users to write accurate and efficient analysis rules. For this reason, they introduce a new language named StarLang, which is a variant of Datalog. It aims to ensure the efficiency of queries by limiting the expressive power, thus making it easier to write static analysis rules. ### Main Problems and Solutions 1. **Complexity of Existing Static Analysis Tools**: - Existing static analysis tools usually use languages such as Datalog to describe analysis rules. Although these languages are powerful, they are too complex for non - expert users. - The complexity of Datalog leads to the difficulty in achieving accurate and efficient performance of analysis rules. 2. **Introduction of StarLang Language**: - In order to simplify the writing of static analysis rules, the authors design StarLang, a subset of Datalog, which only contains programs with fast running time, that is, the time complexity of the decision - making process is low. - StarLang ensures that all queries can be executed quickly by limiting the expressive power (for example, not allowing explicit equality predicates), and allows users to see the code - matching results in real - time when writing rules. 3. **Template Mechanism**: - StarLang introduces a template mechanism, which hides the recursive feature of Datalog, provides a high - level abstraction, and makes queries easier to read and write. - The template mechanism also supports unbounded recursion and nested references, further enhancing the flexibility and ease - of - use of the language. 4. **System Integration and Interaction**: - The paper proposes a system - Snyk Code, which can calculate data flow, taint analysis, pointer analysis, etc., and allows users to execute StarLang queries in real - time. - Snyk Code also includes a front - end named Codesearch, which allows users to construct static analysis queries in an interactive way, greatly simplifying the work of security professionals and static analysis authors. ### Summary The main contribution of this paper lies in significantly reducing the difficulty of writing custom - made static analysis rules by introducing StarLang language and its template mechanism, while maintaining sufficient expressive power and efficient query performance. This not only enables non - expert users to write static analysis rules more easily, but also provides a familiar and powerful tool framework for expert users.