Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Wenting Zhao,Ibrahim Abdelaziz,Julian Dolby,Kavitha Srinivas,Mossad Helali,Essam Mansour
DOI: https://doi.org/10.48550/arXiv.2301.05108
2023-01-05
Abstract:Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively.
Programming Languages,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the difficulty of Python static analysis, especially the challenges brought by Python's dynamic characteristics (such as dynamic types, flexible object structures, and direct links to native code). Although these characteristics make Python very popular in research fields such as artificial intelligence, they also make it difficult to apply traditional static analysis methods. Therefore, the paper proposes a framework named Serenity, which aims to overcome these challenges through the following two mechanisms: 1. **Rely on dynamic dispatch**: Serenity utilizes the core dynamic dispatch feature of the Python language to convert uncertain constructs (such as object creation or function calls) into type - based dynamic dispatch. This method can handle many complex dynamic behaviors in Python. 2. **Extreme abstraction of libraries**: User code usually depends on APIs to create and operate domain objects (such as arrays in NumPy). Serenity simplifies the analysis process by abstracting libraries and only tracking the objects they create and the methods they call. Although this abstraction is not completely accurate, it is sufficient to support certain tasks. Through these two mechanisms, Serenity can perform excellently in two specific application scenarios: - **Code completion**: Serenity's data - flow analysis can focus on relevant code fragments at specific code locations, combined with the local program context, providing better code completion performance than simply relying on the context. - **Automated Machine Learning (AutoML)**: Serenity's static analysis can generate effective machine - learning pipelines, and its performance is comparable to that of dynamic analysis methods. The advantage of static analysis is that it does not need to actually run these pipelines, thus saving a great deal of time and resources. In general, the goal of the Serenity framework is to provide sufficient static analysis capabilities for Python code through the above two mechanisms to support applications such as code completion and automated machine learning. This not only solves the problem of Python static analysis but also demonstrates the potential of static analysis in these fields. ### Formula Representation When describing the Serenity framework, some concepts and technical details are involved, but no specific mathematical formulas are used. If further technical details or formula derivations are required, please inform me of the specific requirements, and I will do my best to provide help.