Goshawk: Hunting Memory Corruptions Via Structure-Aware and Object-Centric Memory Operation Synopsis

Yunlong Lyu,Yi Fang,Yiwei Zhang,Qibin Sun,Siqi Ma,Elisa Bertino,Kangjie Lu,Juanru Li
DOI: https://doi.org/10.1109/sp46214.2022.9833613
2022-01-01
Abstract:Existing tools for the automated detection of memory corruption bugs are not very effective in practice. They typically recognize only standard memory management (MM) APIs (e.g., malloc and free) and assume a naive paired-use model—an allocator is followed by a specific deallocator. However, we observe that programmers very often design their own MM functions and that these functions often manifest two major characteristics: (1) Custom allocator functions perform multi-object or nested allocation which then requires structure-aware deallocation functions. (2) Custom allocators and deallocators follow an unpaired-use model. A more effective detection thus needs to adapt those characteristics and capture memory bugs related to non-standard MM behaviors. In this paper, we present a MM function aware memory bug detection technique by introducing the concept of structure-aware and object-centric Memory Operation Synopsis (MOS). A MOS abstractly describes the memory objects of a given MM function, how they are managed by the function, and their structural relations. By utilizing MOS, a bug detection could explore much less code but is still capable of handling multi-object or nested allocations and does not rely on the paired-use model. In addition, to extensively find MM functions and automatically generate MOS for them, we propose a new identification approach that combines natural language processing (NLP) and data flow analysis, which enables the efficient and comprehensive identification of MM functions, even in very large code bases. We implement a MOS-enhanced memory bug detection system, Goshawk, to discover memory bugs caused by complex and custom MM behaviors. We applied Goshawk to well-tested and widely-used open source projects including OS kernels, server applications, and IoT SDKs. Goshawk outperforms the state-of-the-art data flow analysis driven bug detection tools by an order of magnitude in analysis speed and the number of accurately identified MM functions, reports the discovered bugs with a developer-friendly, MOS based description, and successfully detects 92 new double-free and use-after-free bugs.
What problem does this paper attempt to address?