Abstract:Statically analyzing dynamically-typed code is a challenging endeavor, as even seemingly trivial tasks such as determining the targets of procedure calls are non-trivial without knowing the types of objects at compile time. Addressing this challenge, gradual typing is increasingly added to dynamically-typed languages, a prominent example being TypeScript that introduces static typing to JavaScript. Gradual typing improves the developer's ability to verify program behavior, contributing to robust, secure and debuggable programs. In practice, however, users only sparsely annotate types directly. At the same time, conventional type inference faces performance-related challenges as program size grows. Statistical techniques based on machine learning offer faster inference, but although recent approaches demonstrate overall improved accuracy, they still perform significantly worse on user-defined types than on the most common built-in types. Limiting their real-world usefulness even more, they rarely integrate with user-facing applications. We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations. For effective result retrieval and re-integration, we extract usage slices from a program's code property graph. Comparing our approach against recent neural type inference systems, our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall. Furthermore, we present JoernTI, an integration of our approach into Joern, an open source static analysis tool, and demonstrate that the analysis benefits from the additional type information. As our model allows for fast inference times even on commodity CPUs, making our system available through Joern leads to high accessibility and facilitates security research.

DLInfer: Deep Learning with Static Slicing for Python Type Inference.

TIPICAL -- Type Inference for Python In Critical Accuracy Level

Learning Type Inference for Enhanced Dataflow Analysis

Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python

Static Type Analysis for Python

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Generating Python Type Annotations from Type Inference: How Far Are We?

Dynamic Slicing of Python Programs

Static Type Recommendation for Python.

Type Prediction With Program Decomposition and Fill-in-the-Type Training

iJTyper: An Iterative Type Inference Framework for Java by Integrating Constraint- and Statistically-based Methods

CATI: Context-Assisted Type Inference from Stripped Binaries

Improving type information inferred by decompilers with supervised machine learning

PoTo: A Hybrid Andersen's Points-to Analysis for Python

BinSub: The Simple Essence of Polymorphic Type Inference for Machine Code

Using Python for Model Inference in Deep Learning

TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools

Polymorphic type inference for machine code

Risky Dynamic Typing Related Practices in Python: An Empirical Study

Cross-Lingual Transfer Learning for Statistical Type Inference

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code