Tarsis: An effective automata‐based abstract domain for string analysis

Luca Negrini,Vincenzo Arceri,Agostino Cortesi,Pietro Ferrara
DOI: https://doi.org/10.1002/smr.2647
2024-02-16
Journal of Software Evolution and Process
Abstract:Tarsis is a new abstract domain based on abstract interpretation that approximates string values through finite state automata over an alphabet of strings instead of single characters. Tarsis is in position to obtain strictly more precise results than state‐of‐the‐art approaches. The performance gain w.r.t. the standard automata model is assessed, confirming that Tarsis can obtain precise results without incurring in excessive computational costs. In this paper, we introduce Tarsis, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of Tarsis is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the lattice operators and of the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state‐of‐the‐art approaches. We compare Tarsis both with simpler domains and with the standard automata model, targeting case studies containing standard yet challenging string manipulations. The performance gain w.r.t. the standard automata model is also assessed, measuring the speed‐up gained by Tarsis. Experiments confirm that Tarsis can obtain precise results without incurring in excessive computational costs.
computer science, software engineering
What problem does this paper attempt to address?