Scalable and Cost-effective Data Flow Analysis for Distributed Software: Algorithms and Applications

Xiaoqin Fu
DOI: https://doi.org/10.48550/arXiv.2303.03659
2023-03-07
Abstract:More and more distributed software systems are being developed and deployed today. Like other software, distributed software systems also need very strong quality assurance support. Distributed software is often very large/complex, has distributed components, and does not have a global clock. All these characteristics make it very challenging to analyze the information flow of such systems to support the software quality assurance. One challenge is that existing dynamic analysis techniques hardly scale to large distributed software systems in the real world. It is also challenging to develop cost-effective dynamic analysis approaches. There are also applicability and portability challenges for dynamic analysis algorithms/applications of distributed software. My dissertation addresses these challenges via three novel approaches to data flow analysis for distributed software. My first approach is based on measuring interprocess communications to understand distributed software behaviors and predict distributed software quality. Then, I developed a particular approach that can actually pinpoint sensitive information via multi-staged and refinement-based dynamic information flow analysis for distributed software. Finally, I explored dynamic dependence analysis for distributed systems, utilizing reinforcement learning to automatically adjust analysis configurations for scalability and better cost-effectiveness tradeoffs.
Distributed, Parallel, and Cluster Computing,Cryptography and Security,Software Engineering
What problem does this paper attempt to address?