Kakute: A Precise, Unified Information Flow Analysis System for Big-data Security.
Jianyu Jiang,Shixiong Zhao,Danish Alsayed,Yuexuan Wang,Heming Cui,Feng Liang,Zhaoquan Gu
DOI: https://doi.org/10.1145/3134600.3134607
2017-01-01
Abstract:Big-data frameworks (e.g., Spark) enable computations on tremendous data records generated by third parties, causing various security and reliability problems such as information leakage and programming bugs. Existing systems for big-data security (e.g., Titian) track data transformations in a record level, so they are imprecise and too coarse-grained for these problems. For instance, when we ran Titian to drill down input records that produced a buggy output record, Titian reported 3 to 9 orders of magnitude more input records than the actual ones. Information Flow Tracking (IFT) is a conventional approach for precise information control. However, extant IFT systems are neither efficient nor complete for big-data frameworks, because theses frameworks are data-intensive, and data flowing across hosts is often ignored by IFT. This paper presents KAKUTE, the first precise, fine-grained in- formation flow analysis system for big-data. Our insight on mak- ing IFT efficient is that most fields in a data record often have the same IFT tags, and we present two new efficient techniques called Reference Propagation and Tag Sharing. In addition, we design an efficient, complete cross-host information flow propagation approach. Evaluation on seven diverse big-data programs (e.g., WordCount) shows that KAKUTE had merely 32.3% overhead on average even when fine-grained information control was en abled. Compared with Titian, KAKUTE precisely drilled down the actual bug inducing input records, a huge reduction of 3 to 9 or ders of magnitude. KAKUTE's performance overhead is comparable with Titian. Furthermore, KAKUTE effectively detected 13 real world security and reliability bugs in 4 diverse problems, including information leakage, data provenance, programming and performance bugs. KAKUTE'S source code and results are available on https://github.com/hku-systems/kakute.