On the Caching Schemes to Speed Up Program Reduction
Yongqiang Tian,Xueyan Zhang,Yiwen Dong,Zhenyang Xu,Mengxiao Zhang,Yu Jiang,Shing-Chi Cheung,Chengnian Sun
DOI: https://doi.org/10.1145/3617172
IF: 3.685
2024-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:Program reduction is a highly practical, widely demanded technique to help debug language tools, such as compilers, interpreters and debuggers. Given a program P that exhibits a property Ο, conceptually, program reduction iteratively applies various program transformations to generate a vast number of variants from P by deleting certain tokens and returns the minimal variant preserving Ο as the result. A program reduction process inevitably generates duplicate variants, and the number of them can be significant. Our study reveals that on average 61.8% and 24.3% of the generated variants in two representative program reducers HDD and Perses, respectively, are duplicates. Checking them against Ο is thus redundant and unnecessary, which wastes time and computation resources. Although it seems that simply caching the generated variants can avoid redundant property tests, such a trivial method is impractical in the real world due to the significant memory footprint. Therefore, a memory-efficient caching scheme for program reduction is in great demand. This study is the first effort to conduct a systematic, extensive analysis of memory-efficient caching schemes for program reduction. We first propose to use two well-known compression methods, ZIP and SHA , to compress the generated variants before they are stored in the cache. Furthermore, our keen understanding on the program reduction process motivates us to propose a novel, domain-specific, both memory and computation-efficient caching scheme, R efreshable C ompact C aching ( RCC ). Our key insight is two-fold: β by leveraging the correlation between variants and the original program P , we losslessly encode each variant into an equivalent , compact , canonical representation; β‘ periodically, stale cache entries, which will never be accessed, are timely removed to minimize the memory footprint over time. Our extensive evaluation on 31 real-world C compiler bugs demonstrates that caching schemes help avoid issuing redundant queries by 61.8% and 24.3% in HDD and Perses, respectively; correspondingly, the runtime performance is notably boosted by 22.8% and 18.2%. With regard to the memory efficiency, all three methods use less memory than the state-of-the-art string-based scheme STR . Specifically, ZIP and SHA cut down the memory footprint by more than 80% and 90% in both Perses and HDD compared to STR ; moreover, the highly-scalable, domain-specific RCC dominates peer schemes, and outperforms the SHA by 96.4% and 91.74% in HDD and Perses, respectively.