Getting into the Flow: Towards Better Type Error Messages for Constraint-Based Type Inference

Ishan Bhanuka,Lionel Parreaux,David Binder,Jonathan Immanuel Brachthäuser
DOI: https://doi.org/10.1145/3622812
2024-02-20
Abstract:Creating good type error messages for constraint-based type inference systems is difficult. Typical type error messages reflect implementation details of the underlying constraint-solving algorithms rather than the specific factors leading to type mismatches. We propose using subtyping constraints that capture data flow to classify and explain type errors. Our algorithm explains type errors as faulty data flows, which programmers are already used to reasoning about, and illustrates these data flows as sequences of relevant program locations. We show that our ideas and algorithm are not limited to languages with subtyping, as they can be readily integrated with Hindley-Milner type inference. In addition to these core contributions, we present the results of a user study to evaluate the quality of our messages compared to other implementations. While the quantitative evaluation does not show that flow-based messages improve the localization or understanding of the causes of type errors, the qualitative evaluation suggests a real need and demand for flow-based messages.
Programming Languages,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to generate better type error messages for constraint - based type inference systems**. Specifically, existing compilers, when reporting type errors, usually only reflect the implementation details of the underlying constraint - solving algorithms, rather than the specific factors that lead to type mismatches. This makes it difficult for programmers to understand the cause of the error and fix it. ### Main objectives and methods of the paper 1. **Introduce the concept of data flow**: - The author proposes to use subtyping constraints to capture data flow and classify and explain type errors accordingly. In this way, error messages can be interpreted as defective data flows, and programmers are already accustomed to reasoning about these data flows. - Data flow is represented as a sequence of relevant positions in the program, thus helping programmers better understand the source and propagation path of the error. 2. **Extend to Hindley - Milner type inference**: - The author's method is not limited to languages with subtypes, and can also be integrated with the Hindley - Milner type inference system. This means that their method can be applied to a wider range of languages and systems. 3. **User study evaluation**: - To evaluate the effectiveness of the new method, the author conducted a user study, comparing the error messages of the HMℓ system they proposed with those of other existing compilers (such as OCaml and Helium). - Although the quantitative evaluation did not show that HMℓ has a significant improvement in locating or understanding the causes of type errors, the qualitative evaluation indicates that there is a practical need for data - flow - based error messages when dealing with complex type errors. ### Formulas and concepts - **Subtyping constraint**: Represented by the formula \(\tau_1 <: \tau_2\), which means that type \(\tau_1\) is a subtype of type \(\tau_2\). - **Type unification error classification**: Classify type unification errors according to the number of changes in the data flow direction. For example, a Level - \(n\) error means that the data flow direction changes \(n\) times. \[ \text{Level - }n\text{ error: }\tau_1 <: \cdots <: \tau_n >: \cdots >: \tau_m \] ### Conclusion By introducing a data - flow - based error message generation method, this paper aims to improve the quality of type error messages, enabling programmers to understand and fix type problems in code more quickly and accurately. This method is not only applicable to languages with subtypes, but can also be extended to a wider range of programming languages and type systems.