Xin Zhang — Research Statement
Xin Zhang
2010-01-01
Abstract:Software is becoming increasingly pervasive and complex. The goal of my research is to help developers build software that is correct, reliable, secure, and efficient. Towards this end, my research interest lies in the intersection of programming languages and software engineering with an emphasis on program analysis, a technique that automatically reasons about program properties of interest. Practical program analysis tools have demonstrated the ability to prove non-trivial properties of real-world software or find critical defects related to violations of these properties. Examples of such tools include Microsoft Static Driver Verifier, Astrée, Coverity, and Facebook Infer. Due to the undecidable nature of program analysis problems in general, however, these tools inevitably make approximations. These approximations, collectively called the abstraction, control the soundness, accuracy, and scalability of the tool. State-of-the-art program analysis tools rely on an expert designer to carefully choose an abstraction that they deem appropriate for all possible usage scenarios. However, due to the rich variety in analysis user expectations, individual properties of interest, and characteristics of the considered programs, there is no such one-size-fits-all abstraction. As a result, analysis tools often fail to meet the needs of individual usage scenarios, which greatly hinders their soundness, accuracy, and scalability. My thesis research addresses this challenge by proposing a user-centric approach to program analysis. My key insight is that, instead of pursuing a one-size-fits-all abstraction when designing the analysis, we can on the fly tailor the abstraction to the needs of individual usage scenarios. Such needs concern the feedback from the analysis users, the assertions of interest, and the characteristics of the subject programs. I addressed two central technical challenges in order to enable a user-centric approach to program analysis: 1. How can we adapt an existing analysis that has a fixed abstraction and is therefore rigid to a given usage scenario? 2. How can we scale the proposed approach to real-world programs, properties, and users? I addressed these challenges by proposing a unified constraint-based framework for user-centric program analysis. For separation of concerns, the framework comprises a front-end, Petablox, and a back-end, Nichrome, which address the above two challenges respectively. Petablox addresses the first challenge for arbitrary program analyses specified in Datalog, a declarative logic programming language. Instead of using a fixed abstraction, Petablox synthesizes a family of abstractions that differ in accuracy, scalability, or sometimes soundness, and dynamically selects an abstraction that is most suitable for the current usage scenario. I refer to this problem of adapting the analysis to a given usage scenario as the user-centric analysis problem. Petablox formulates this problem as a system of mixed hard and soft constraints. While the hard constraints encode the family of viable abstractions, the soft constraints encode the objective of finding the optimum abstraction under the given usage scenario by balancing various tradeoffs. Compared to the conventional analysis problem, which is a satisfiability problem, the user-centric analysis problem is even more challenging to solve as it is an optimization problem. Ideally, a solver for such problems should be sound (i.e., it does not violate any hard constraint), optimal , (i.e., it maximizes the objective), and scalable (i.e., it can solve constraints generated from real-world programs, properties, and users). All existing solvers such as Alchemy, Tuffy, RockIt, CPI, and Z3 sacrifice one or more of the three properties. To address this challenge, Nichrome solves a system of mixed hard and soft constraints by reducing it into a (weighted partial) maximum satisfiability (MaxSAT) problem. To enable sound, optimal, and scalable solving, I have proposed several novel techniques in both problem reduction and MaxSAT solving. While Nichrome successfully solves the user-centric analysis problem, it is not limited to program analysis and can be applied to problems in other domains like information retrieval, machine learning, and mathematical optimization. Next, I elaborate upon Petablox and Nichrome.