Abstract:Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the difficulty of Python static analysis, especially the challenges brought by Python's dynamic characteristics (such as dynamic types, flexible object structures, and direct links to native code). Although these characteristics make Python very popular in research fields such as artificial intelligence, they also make it difficult to apply traditional static analysis methods. Therefore, the paper proposes a framework named Serenity, which aims to overcome these challenges through the following two mechanisms: 1. **Rely on dynamic dispatch**: Serenity utilizes the core dynamic dispatch feature of the Python language to convert uncertain constructs (such as object creation or function calls) into type - based dynamic dispatch. This method can handle many complex dynamic behaviors in Python. 2. **Extreme abstraction of libraries**: User code usually depends on APIs to create and operate domain objects (such as arrays in NumPy). Serenity simplifies the analysis process by abstracting libraries and only tracking the objects they create and the methods they call. Although this abstraction is not completely accurate, it is sufficient to support certain tasks. Through these two mechanisms, Serenity can perform excellently in two specific application scenarios: - **Code completion**: Serenity's data - flow analysis can focus on relevant code fragments at specific code locations, combined with the local program context, providing better code completion performance than simply relying on the context. - **Automated Machine Learning (AutoML)**: Serenity's static analysis can generate effective machine - learning pipelines, and its performance is comparable to that of dynamic analysis methods. The advantage of static analysis is that it does not need to actually run these pipelines, thus saving a great deal of time and resources. In general, the goal of the Serenity framework is to provide sufficient static analysis capabilities for Python code through the above two mechanisms to support applications such as code completion and automated machine learning. This not only solves the problem of Python static analysis but also demonstrates the potential of static analysis in these fields. ### Formula Representation When describing the Serenity framework, some concepts and technical details are involved, but no specific mathematical formulas are used. If further technical details or formula derivations are required, please inform me of the specific requirements, and I will do my best to provide help.

Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Scalpel: The Python Static Analysis Framework

Supporting secure programming in web applications through interactive static analysis.

PyAnalyzer: an Effective and Practical Approach for Dependency Extraction from Python Code

A Static Evaluation of Code Completion by Large Language Models

Context-Sensitive Abstract Interpretation of Dynamic Languages

ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs

Integration of Static and Dynamic Code Stylometry Analysis for Programmer De-anonymization

Static Type Analysis for Python

Detecting Code Smells in Python Programs

PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

Sawja: Static Analysis Workshop for Java

JSAI: Designing a Sound, Configurable, and Efficient Static Analyzer for JavaScript

CoBOT: Static C/C plus plus Bug Detection in the Presence of Incomplete Code

pycefr: Python Competency Level through Code Analysis

Static Analysis Driven Enhancements for Comprehension in Machine Learning Notebooks

The Dynamics of Software Composition Analysis

Self-adaptive static analysis

Static Code Analysis of Multilanguage Software Systems

Naturalistic Static Program Analysis

Pynblint: a Static Analyzer for Python Jupyter Notebooks